Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms

Dale Tronrud Wed, 15 Dec 2010 09:47:22 -0800

Dear Ian,

   I think you are putting too much importance on the numerical
instability of an atom's position when refining with full matrix
refinement.  When developing TNT's code for calculating second
derivatives I found that building into the calculation the effects
of such an atom overlapping its own, symmetry related, electron
density eliminated the instability and no constraints to special
positions were required.  I was only working with block diagonal
second derivatives with one block per atom but I don't see any
reason the proper calculation would not work with the full matrix.
The electron density of an atom near a special position is nearly
that of one far away.  It is not reasonable that a proper
calculation would blow up for one and not the other.  The key is
doing the "proper" calculation.


   It's true that the proper calculation of the atomic block for
an atom near a special position took more time than the calculation
for all the other atoms in the model.  You can't just calculate
generic look-up tables that apply to all atoms.  The reward of the
full calculation is that all the complications you describe disappear.
An atom that sits 0.001 A from a special position is not unstable
in the least.  It does, of course, have to have an occupancy of
1/n.  I always avoid programing tests of a == b for real numbers
because the round-off errors will always bite you at some point.
This means that a test of an atom exactly on a special position
can't be done reliably in floating point math.

   Your preferred assumption is that any atom "near enough" to
a special position is really on the special position and should
have an occupancy of one.  My assumption is that no atom is every
EXACTLY on the special position and if they are close enough to
their symmetry image to forbid coexistence the occupancy should
be 1/n.  I think either assumption is reasonable but, of course,
prefer mine for what I consider practical reasons.  It helps that
I have to code to make mine work.

Dale Tronrud

On 12/15/10 08:54, Ian Tickle wrote:
> Hi Herman
> 
> What makes an atom on a special position is that it is literally ON
> the s.p.: it can't be 'almost on' the s.p. because then if you tried
> to refine the co-ordinates perpendicular to the axis you would find
> that the matrix would be singular or at least so badly conditioned
> that the solution would be poorly defined.  The only solution to that
> problem is to constrain (i.e. fix) these co-ordinates to be exactly on
> the axis and not attempt to refine them.  The data are telling you
> that you have insufficient resolution so you are not justified in
> placing the atom very close to the axis; the best you can do is place
> the atom with unit occupancy exactly _on_ the axis.  It's only once
> the atom is a 'significant' distance (i.e. relative to the resolution)
> away from the axis that these co-ordinates can be independently
> refined.  Then the data are telling you that the atom is disordered.
> If you collected higher resolution data you might well be able to
> detect & successfully refine disordered atoms closer to the axis than
> with low resolution data.  So it has nothing to do with the programmer
> setting an arbitrary threshold.  This would have to be some
> complicated function of atom type, occupancy, B factor, resolution,
> data quality etc to work properly anyway so I doubt that it would be
> feasible.  Instead it's determined completely by what the data are
> capable of telling you about the structure, as indeed it should be.
> 
> My main concern was the conflict between some program implementations
> and the PDB and mmCIF format descriptions on this issue.  For example
> the PDB documentation says that the ATOM record contains the occupancy
> (where this is defined in the CIF/mmCIF documentation).  If it had
> intended that it should contain multiplicity*occupancy instead then
> presumably it would have said so.
> 
> Cheers
> 
> -- Ian
> 
> On Wed, Dec 15, 2010 at 4:01 PM,  <herman.schreu...@sanofi-aventis.com> wrote:
>> Dear Ian,
>>
>> In my view, the confusion arises by NOT including the multiplicity into the 
>> occupancy. If we make the gedanken experiment and look at a range of crystal 
>> structures with a rotationally disordered water molecule near a symmetry 
>> axis (they do exist!) then as long as the water molecule is sufficiently far 
>> from the axis, it is clear that the occupancy should be 1/2 or 1/3 or 
>> whatever is the multiplicity. However, as the molecule approaches the axis 
>> at a certain moment at a certain treshold set by the programmer of the 
>> refinement program, the molecule suddenly becomes special and the occupancy 
>> is set to 1.0. So depending on rounding errors, different thresholds etc. 
>> different programs may make different decisions on whether a water is 
>> special or not.
>>
>> For me, this is confusing.
>>
>> Best regards,
>> Herman
>>
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian 
>> Tickle
>> Sent: Wednesday, December 15, 2010 3:47 PM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms
>>
>> Dear George
>>
>> I notice that the Oxford CRYSTALS program, which is what I used when I did 
>> small-molecule crystallography and which is still quite popular among the 
>> small-molecule people (maybe not as much as Shel-X!), uses the CIF 
>> convention:
>>
>> OCC= This parameter defines the site occupancy EXCLUDING special position 
>> effects (i.e. is the 'chemical occupancy'). The default is 1.0.  Special 
>> position effects are computed by CRYSTALS and multiplied onto this parameter.
>>
>> (from http://www.xtl.ox.ac.uk/crystalsmanual-atomic.html )
>>
>> Also the mmCIF specification on this is the same CIF one (hardly surprising 
>> I guess since it's derived from it):
>>
>> _atom_site.occupancy  The fraction of the atom type present at this site.
>> The sum of the occupancies of all the atom types at this site may not 
>> significantly exceed 1.0 unless it is a dummy site.
>>
>> (from 
>> http://mmcif.pdb.org/dictionaries/mmcif_std.dic/Items/_atom_site.occupancy.html
>> )
>>
>> which doesn't say so specifically, but it's implied since if the 
>> multiplicity is included then the maximum value of the sum is the 
>> multiplicity, not 1.0.
>>
>> So there's a real possibility of user - and programmer - confusion here!  I 
>> must say that until I looked at the 4INS file I had assumed that the PDB 
>> occupancy was what it claimed to be, i.e. the real 'chemical' occupancy not 
>> the multiplicity-fudged one.
>>
>> Cheers
>>
>> -- Ian
>>
>> On Wed, Dec 15, 2010 at 1:53 PM, George M. Sheldrick 
>> <gshe...@shelx.uni-ac.gwdg.de> wrote:
>>>
>>> Dear Ian,
>>>
>>> Yes. Once an atom has been identified as on a special position because
>>> it is within a specied tolerance, SHELXL applies the appropriate
>>> contraints to both the coordinates and the Uij so there is no danger
>>> of the atom wandering off the special position. Usually, when an atom
>>> it very close to a special position but not actually on it, it is part
>>> of a disordered solvent molecule and will be prevented from
>>> misbehaving by distance and Uij restraints imposed by the user; in
>>> such a case the user usually also switches off the special position
>>> check for that disordered molecule (SPEC -1) to avoid atoms being
>>> idealized onto the special position by the program. For solvent
>>> molecules disordered on special positions it is also necessary to
>>> ignore symmetry equivalent atoms when generating idealized hydrogen
>>> atoms etc. (PART -N in SHELXL). This is all routine practice in small
>>> molecule crystallography. I agree that the use of orthogonal rather
>>> than crystal coordinates can obscur the situation, e.g. for an atom on a 
>>> threefold axis.
>>>
>>> Best wishes, George
>>>
>>> Prof. George M. Sheldrick FRS
>>> Dept. Structural Chemistry,
>>> University of Goettingen,
>>> Tammannstr. 4,
>>> D37077 Goettingen, Germany
>>> Tel. +49-551-39-3021 or -3068
>>> Fax. +49-551-39-22582
>>>
>>>
>>> On Wed, 15 Dec 2010, Ian Tickle wrote:
>>>
>>>> Dear George
>>>>
>>>> I would say that an atom has fractional occupancy (but unit
>>>> multiplicity) unless it's exactly on the special position (though I
>>>> can foresee problems with rounding of decimal places for an atom say
>>>> at x=1/3), so that effectively once the atom is fixed exactly on the
>>>> s.p. the symmetry copies coalesce into a single atom with unit
>>>> occupancy (but fractional multiplicity).  This is at least one
>>>> advantage of having co-ordinates stored as fractional - it would
>>>> probably be more tricky with orthogonalised co-ordinates.  Presumably
>>>> once an input atom has satisfied the condition of being 'sufficiently
>>>> close' to a s.p. to be considered as 'on' the s.p. then the
>>>> constraints fix the co-ordinates exactly on the special position and
>>>> henceforth it's forcibly prevented from moving off it?  In any case
>>>> if an atom is very close to its symmetry copy you are going to have
>>>> matrix conditioning problems for the co-ordinates perpendicular to
>>>> the axis of symmetry (or mirror plane), so then you have no choice
>>>> but to disallow co-ordinate shifts of the atom which would take it
>>>> off the special position?
>>>>
>>>> Cheers
>>>>
>>>> -- Ian
>>>>
>>>> On Wed, Dec 15, 2010 at 11:42 AM, George M. Sheldrick
>>>> <gshe...@shelx.uni-ac.gwdg.de> wrote:
>>>>>
>>>>> Dear Ian,
>>>>>
>>>>> Of course I could convert the occupancy on reading the atom in and
>>>>> convert it back agains on reading it out. This is not quite so
>>>>> trivial as it sounds because I need to set a threshold as to how
>>>>> close the atom has to be to a special position to be treated as
>>>>> special, and take care that rounding errors have the same effect on
>>>>> input and output and that the coordinates have not moved in or out
>>>>> of the special zone in the meantime.
>>>>>
>>>>> As it stands in SHELX, an atom that is near to a twofold will have
>>>>> an occupancy of 0.5 whether it is disordered close to a special
>>>>> position or whether it is really special, so this is never a problem.
>>>>>
>>>>> SHELXL is mainly used for small molecules that frequently have
>>>>> atoms on speical positions, and disordered solvent molecules
>>>>> approximately on sppecial positions are also very common (for
>>>>> example in centrosymmetric space groups toluene usually lies on the
>>>>> center of symmetry). Occupancies are often tied to free variables
>>>>> which would also complicate any changes to the code. And in any
>>>>> case, SHELX has been upwards compatible for the last 35 years and I wish 
>>>>> it to remain that way.
>>>>>
>>>>> Best wishes, George
>>>>>
>>>>> Prof. George M. Sheldrick FRS
>>>>> Dept. Structural Chemistry,
>>>>> University of Goettingen,
>>>>> Tammannstr. 4,
>>>>> D37077 Goettingen, Germany
>>>>> Tel. +49-551-39-3021 or -3068
>>>>> Fax. +49-551-39-22582
>>>>>
>>>>>
>>>>> On Wed, 15 Dec 2010, Ian Tickle wrote:
>>>>>
>>>>>> Dear George
>>>>>>
>>>>>> Is applying the multiplicity factor to the occupancy internally in
>>>>>> the program such a issue anyway?  It need only be done once per
>>>>>> atom on input (i.e. you multiply each input occupancy by the
>>>>>> multiplicity to get the combined multiplicity*occupancy value that
>>>>>> you would have reading in directly in the current version), and
>>>>>> then once per atom again on output, reversing the process.  There
>>>>>> shouldn't be any need to change anything in the inner
>>>>>> atom/reflection loop where obviously it would indeed have slowed things 
>>>>>> down.
>>>>>>
>>>>>> I can see though that the backwards-compatibility issue is more
>>>>>> serious.  However I suspect it will affect only a small proportion
>>>>>> of cases (though I accept that the fact that it may affect any at
>>>>>> all may be sufficient grounds for you to reject it!).  If the
>>>>>> input value exceeds the multiplicity we can say that it's
>>>>>> definitely an occupancy (otherwise clearly the occupancy would be
>>>>>>> 1).  If it's less there's an ambiguity for sure; however then
>>>>>> it's more likely to be the multiplicity*occupancy (so the
>>>>>> occupancy is nearer to 1), on the grounds that small occupancies
>>>>>> are less likely to be observed, because the effect on diffraction
>>>>>> will be less significant.  I accept that second-guessing the
>>>>>> user's intentions in this way is not ideal!  I wonder how often
>>>>>> fractional occupancies are observed at special positions anyway?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> -- Ian
>>>>>>
>>>>>> On Fri, Dec 10, 2010 at 11:28 PM, George M. Sheldrick
>>>>>> <gshe...@shelx.uni-ac.gwdg.de> wrote:
>>>>>>> SHELXL also expects that the occupancy of a fully occupied atom
>>>>>>> on a threefold axis should be set at 1/3, and will generate this
>>>>>>> automatically if necessary. It will also generate automatically
>>>>>>> the necessary constraints for the x, y and z parameters (and for
>>>>>>> the Uij if the atom is anisotropic). It is essential that this
>>>>>>> is done correctly if a full-matrix refinement is being performed
>>>>>>> (e.g. to get esd estimates), otherwise the refinement can
>>>>>>> explode. The user may change or switch off the tolerance for
>>>>>>> detecting whether an atom is on a special position (with the
>>>>>>> SPEC instruction). Setting the occupancy to a fraction avoided a
>>>>>>> complicated IF construction inside a loop and 35 years ago
>>>>>>> computers were so slow! I can't change it now because I have to
>>>>>>> preserve upwards compatibility. Unfortunately the CIF committee
>>>>>>> decided to use the other definition (i.e. the Zn on the
>>>>>>> threefold axis has an occupancy of 1.0) and this has caused
>>>>>>> considerable confusion in the small molecule world ever since; atoms 
>>>>>>> are frequently encountered on special positions in inorganic and 
>>>>>>> mineral structures.
>>>>>>>
>>>>>>> George
>>>>>>>
>>>>>>> Prof. George M. Sheldrick FRS
>>>>>>> Dept. Structural Chemistry,
>>>>>>> University of Goettingen,
>>>>>>> Tammannstr. 4,
>>>>>>> D37077 Goettingen, Germany
>>>>>>> Tel. +49-551-39-3021 or -3068
>>>>>>> Fax. +49-551-39-22582
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 10 Dec 2010, Ed Pozharski wrote:
>>>>>>>
>>>>>>>> On Fri, 2010-12-10 at 21:53 +0000, Ian Tickle wrote:
>>>>>>>>> Hmmm - but shouldn't the occupancy of the Zn be 1.00 if it's
>>>>>>>>> on the special position
>>>>>>>>
>>>>>>>> Shouldn't 1/3 be better for programming purposes?  If you set
>>>>>>>> occupancy to 1.0, then you should specify that symmetry
>>>>>>>> operators do not apply for these atoms, making Fc calculation a bit 
>>>>>>>> more cumbersome.
>>>>>>>>
>>>>>>>> If definition of the "asu content" is "you get full content of
>>>>>>>> the unit cell after applying symmetry operators", then
>>>>>>>> occupancy *must* be 1/3, right?
>>>>>>>>
>>>>>>>> The first zinc and the water are on special position, but
>>>>>>>> because they are not excluded from positional refinement
>>>>>>>> (perhaps they should be), they will drift a bit.  CNS has
>>>>>>>> distance cutoff for treating atoms as special positions, if it
>>>>>>>> jumps over the limit during, say, simulated annealing, it  will
>>>>>>>> cause problems.  Perhaps PROLSQ did something similar.  It is a
>>>>>>>> good question if it's better to fix these in place or let them
>>>>>>>> wobble a bit to account for some potential disorder.  While I
>>>>>>>> see the formal argument that it should be nailed to three-fold
>>>>>>>> axes, it is also true that this is a mathematical compromise to
>>>>>>>> simplify modeling that does not reflect physical reality (i.e.
>>>>>>>> you don't have three partially occupied zinc ions, it's just
>>>>>>>> one).  In any event, given that this is a 1.5A structure, (-0.002 
>>>>>>>> 0.004) is statistically speaking the same as (0 0).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Ed.
>>>>>>>>
>>>>>>>> --
>>>>>>>> "I'd jump in myself, if I weren't so good at whistling."
>>>>>>>>                                Julian, King of Lemurs
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>

Re: [ccp4bb] Fwd: [ccp4bb] Wyckoff positions and protein atoms

Reply via email to