I like your IMGATM proposal, but wouldn't it also potentially break
some of the programs? Also--and this is a problem with deleting only
sidechain atoms in general--it seems that many, myself included, might
totally miss that an apparent "alanine" is really a trunco-lysine.
What I like is that it does get around the problem of people
over-interpreting bogus sidechains, but it falls short, perhaps, in
misleading people about what residue is there. I, for one, would not
feel that I had to click on all the alanines in a model to verify that
they were not lysines, and would be surprised and puzzled for a while
about why this ala said lys when I clicked on it. Wouldn't you be
surprised? (Well, maybe not after this thread...)

JPK



On Mon, Apr 4, 2011 at 1:55 AM, Dale Tronrud <det...@uoxray.uoregon.edu> wrote:
>   The definition of _atom_site.occupancy is
>
>  The fraction of the atom type present at this site.
>  The sum of the occupancies of all the atom types at this site
>  may not significantly exceed 1.0 unless it is a dummy site.
>
> When an atom has an occupancy equal to zero that means that the
> atom is NEVER present at that site - and that is not what you
> intend to say.  Setting the occupancy to zero does not mean that
> a full atom is located somewhere in this area.  Quite the opposite.
>
>   (The reference to a dummy site is interesting and implies to
> me that mmCIF already has the mechanism you wish for.)
>
>   Having some experience with refining low occupancy atoms and
> working with dummy marker atoms I'm quite confident that you can
> never define a B factor cutoff that would work.  No matter what
> value you choose you will find some atoms in density that refine
> to values greater than the cutoff, or the limit you choose is so
> high that you will find marker atoms that refine to less than the
> limit.  A B factor cutoff cannot work - no matter the value you
> choose you will always be plagued with false positives or false
> negatives.
>
>   If you really want to stuff this bit into one of these fields
> you have to go all out.  Set the occupancy of a marker atom to -99.99.
> This will unambiguously mark the atom as an imaginary one.  This
> will, of course, break every program that reads PDB format files,
> but that is what should happen in any case.  If you change the
> definition of the columns in the file you must mandate that all
> programs be upgraded to recognized the new definitions.  I don't
> know how you can do that other than ensuring that the change will
> cause programs to cough.  To try to slide it by with a magic value
> that will be silently accepted by existing programs is to beg for
> bugs and subtle side-effects.
>
>   Good luck getting the maintainers of the mmCIF standard to accept
> a magic value in either of these fields.
>
>   How about this: We already have the keywords ATOM and HETATM
> (and don't ask me why we have two).  How about we create a new
> record in the PDB format, say IMGATM, that would have all the
> fields of an ATOM record but would be recognized as whatever the
> marker is for "dummy" atoms in the current mmCIF?  Existing programs
> would completely ignore these atoms, as they should until they are
> modified to do something reasonable with them.  Those of us who
> have no use for them can either use a switch in the program to
> ignore them or just grep them out of the file.  Someone could write
> a program that would take a model with only ATOM and HETATM records
> and fill out all the desired IMGATM records (Let's call that program
> WASNIAHC, everyone would remember that!).
>
>   This solution is unambiguous.  It can be represented in current
> mmCIF, I think.  The PDB could run WASNIAHC themselves after deposition
> but before acceptance by the depositor so people like me would not
> have to deal with them during refinement but would be able to see
> them before our precious works of art are unleashed on the world.
>
>   Seems like a win-win solution to me.
>
> Dale Tronrud
>
>
> On 4/3/2011 9:17 PM, Jacob Keller wrote:
>>
>> Well, what about getting the default settings on the major molecular
>> viewers to hide atoms with either occ=0 or b>cutoff ("novice mode?")?
>> While the b cutoff is still be tricky, I assume we could eventually
>> come to consensus on some reasonable cutoff (2 sigma from the mean?),
>> and then this approach would allow each free-spirited crystallographer
>> to keep his own preferred method of dealing with these troublesome
>> sidechains and nary a novice would be led astray....
>>
>> JPK
>>
>> On Sun, Apr 3, 2011 at 2:58 PM, Eric Bennett<er...@pobox.com>  wrote:
>>>
>>> Most non-structural users are familiar with the sequence of the proteins
>>> they are studying, and most software does at least display residue identity
>>> if you select an atom in a residue, so usually it is not necessary to do any
>>> cross checking besides selecting an atom in the residue and seeing what its
>>> residue name is.  The chance of somebody misinterpreting a truncated Lys as
>>> Ala is, in my experience, much much lower than the chance they will trust
>>> the xyz coordinates of atoms with zero occupancy or high B factors.
>>>
>>> What worries me the most is somebody designing a whole biological
>>> experiment around an over-interpretation of details that are implied by xyz
>>> coordinates of atoms, even if those atoms were not resolved in the maps.
>>>  When this sort of error occurs it is a level of pain and wasted effort that
>>> makes the "pain" associated with having to build back in missing side chains
>>> look completely trivial.
>>>
>>> As long as the PDB file format is the way users get structural data,
>>> there is really no good way to communicate "atom exists with no reliable
>>> coordinates" to the user, given the diversity of software packages out there
>>> for reading PDB files and the historical lack of any standard way of dealing
>>> with this issue.  Even if the file format is hacked there is no way to force
>>> all the existing software out there to understand the hack.  A file format
>>> that isn't designed with this sort of feature from day one is not going to
>>> be fixable as a practical matter after so much legacy code has accumulated.
>>>
>>> -Eric
>>>
>>>
>>>
>>> On Apr 3, 2011, at 2:20 PM, Jacob Keller wrote:
>>>
>>>> To the delete-the-atom-nik's: do you propose deleting the whole
>>>> residue or just the side chain? I can understand deleting the whole
>>>> residue, but deleting only the side chain seems to me to be placing a
>>>> stumbling block also, and even possibly confusing for an experienced
>>>> crystallographer: the .pdb says "lys" but it looks like an ala? Which
>>>> is it? I could imagine a lot of frustration-hours arising from this
>>>> practice, with people cross-checking sequences, looking in the methods
>>>> sections for mutations...
>>>>
>>>> JPK
>>>>
>>>
>>
>>
>>
>



-- 
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
cel: 773.608.9185
email: j-kell...@northwestern.edu
*******************************************

Reply via email to