Re: [ccp4bb] problem of conventions

Ian Tickle Fri, 01 Apr 2011 03:49:40 -0700

On Fri, Apr 1, 2011 at 5:30 AM, Santarsiero, Bernard D. <b...@uic.edu> wrote:
> Ian,
>
> I think it's amazing that we can program computers to resolve a < b < c
> but it would be "a major undertaking" to store the matrix transformations
> for 22121 to 21212 and reindex a cell to a standard setting.

I think you misunderstood the point I was making.  Multiply your one
by the several hundred datasets we sometimes collect for the various
clones and crystallisation conditions needed to optimise the crystal
form for soaking - that's what I mean by 'major undertaking'.  As I
explained all the datasets collected for a given crystal form have to
be indexed the same way (even if only for archival purposes) before we
can store them in the database (otherwise we would end up in an awful
muddle!).  I don't have a batch script to filter all the relevant
datasets from the database, re-index each one (that's the easy part!),
and re-register them all as a new crystal form.  Why should I? -
no-one has given me a cogent reason to re-index them in the first
place which would justify the resulting downtime of the project (OK
call me lazy!).  I hope you see that doing each one manually is a
non-starter: the project would have to be locked during the period of
the operation so no new datasets could be down- or uploaded (which
would further cause the upstream pipeline to backup).  Operations that
appear trivial when you only have to do them once suddenly become big
problems when they have to be performed on an industrial scale!

> I was also
> told that I was lazy to not reindex to the standard setting when I was a
> grad student. Now it takes less than a minute to enter a transformation
> and re-index.

They told you wrong!  The conventional cell is the convention (by
definition!), and the standard setting doesn't always correspond to
the conventional cell (though in most cases it does).  There's a
reason for the distinction between meanings of 'standard' and
'conventional' - the meanings are very precise and
non-interchangeable.

> The orthorhombic rule of a < b < c makes sense in 222 or 212121, but when
> there is a standard setting of the 2-fold along the c-axis, then why not
> adopt that?

As I explained, sometimes we don't know the true space group (in terms
of assigning the screw axes) until further along the pipeline (e.g.
after MR or refinement), or at least it's always safer to be
non-committal beyond P222 - why commit oneself to an irrevocable
decision before it's absolutely necessary?  You don't need to know the
exact space group just to screen crystals for diffracting power!
Adopting the standard setting would in the particular case of SGs 5,
17 & 18 require later re-indexing & I hope you see why for us that's a
non-starter.

I'm not a believer in conventions for their own sake - a convention is
merely a default set of rules which you apply when you have no sound
basis on which to make a choice - the convention makes what is
effectively a totally arbitrary choice for you.  Conventions do have
the advantage that if other people follow them then they will make the
same decisions as you.  The moment I have sufficient justification
(e.g. as I said isomorphism overrides convention) to break with
convention then I would have no hesitation in doing so.  The fact that
the standard setting has a 2-fold along c is merely an arbitrary
choice and doesn't seem to me to be a good enough reason to break with
the unit-cell convention.

-- Ian

>
> On Thu, March 31, 2011 5:48 pm, Ian Tickle wrote:
>> On Thu, Mar 31, 2011 at 10:43 PM, James Holton <jmhol...@lbl.gov> wrote:
>>> I have the 2002 edition, and indeed it only contains space group
>>> numbers up to 230.  The page numbers quoted by Ian contain space group
>>> numbers 17 and 18.
>>
>> You need to distinguish the 'IT space group number' which indeed goes
>> up to 230 (i.e. the number of unique settings), from the 'CCP4 space
>> group number' which, peculiar to CCP4 (which is why I called it
>> 'CCP4-ese'), adds a multiple of 1000 to get a unique number for the
>> alternate settings as used in the API.  The page I mentioned show the
>> diagrams for IT SG #18 P22121 (CCP4 #3018), P21221 (CCP4 #2018) and
>> P21212 (CCP4 #18), so they certainly are all there!
>>
>>> Although I am all for program authors building in support for the
>>> "screwy orthorhombics" (as I call them), I should admit that my
>>> fuddy-duddy strategy for dealing with them remains simply to use space
>>> groups 17 and 18, and permute the cell edges around with REINDEX to
>>> put the unique (screw or non-screw) axis on the "c" position.
>>
>> Re-indexing is not an option for us (indeed if there were no
>> alternative, it would be a major undertaking), because the integrity
>> of our LIMS database requires that all protein-ligand structures from
>> the same target & crystal form are indexed with the same (or nearly
>> the same) cell and space group (and it makes life so much easier!).
>> With space-groups such as P22121 it can happen (indeed it has
>> happened) that it was not possible to define the space group correctly
>> at the processing stage due to ambiguous absences; indeed it was only
>> after using the "SGALternative ALL" option in Phaser and refining each
>> TF solution that we identified the space group correctly as P22121.
>>
>> Having learnt the lesson the hard way, we routinely use P222 for all
>> processing of orthorhombics, which of course always gives the
>> conventional a<b<c setting, and only assign the space group well down
>> the pipeline and only when we are 100% confident; by that time it's
>> too late to re-index (indeed why on earth would we want to give
>> ourselves all that trouble?).  This is therefore totally analogous to
>> the scenario of yesteryear that I described where it was common to see
>> a 'unit cell' communication followed some years later by the structure
>> paper (though we have compressed the gap somewhat!), and we base the
>> setting on the unit cell convention for exactly the same reason.
>>
>> It's only if you're doing 1 structure at a time that you can afford
>> the luxury of re-indexing - and also the pain: many times I've seen
>> even experienced people getting their files mixed up and trying to
>> refine with differently indexed MTZ & PDB files (why is my R factor so
>> high?)!  My advice would be - _never_ re-index!
>>
>> -- Ian
>>
>>
>>>  I have
>>> yet to encounter a program that gets broken when presented with data
>>> that doesn't have a<b<c, but there are many non-CCP4 programs out
>>> there that still don't seem to understand P22121, P21221, P2122 and
>>> P2212.
>>
>> I find that surprising!  Exactly which 'many' programs are those?  You
>> really should report them to CCP4 (or to me if it's one of mine) so
>> they can be fixed!  We've been using CCP4 programs as integral
>> components of our processing pipeline (from data processing through to
>> validation) for the last 10 years and I've never come across one
>> that's broken in the way you describe (I've found many broken for
>> other reasons and either fixed it myself or reported it - you should
>> do the same!).  Any program which uses csymlib with syminfo.lib can
>> automatically handle all space groups defined in syminfo, which
>> includes all the common alternates you mentioned (and others such as
>> I2).  The only program I'm aware of that's limited to the standard
>> settings is sftools (because it has its own internal space group table
>> - it would be nice to see it updated to use syminfo!).
>>
>>> This is not the only space group convention "issue" out there!  The
>>> R3x vs H3x business continues to be annoying to this day!
>>
>> Yeah to that!  H centring was defined in IT long ago (look it up) and
>> it has nothing to do with the R setting!
>>
>> -- Ian
>>
>
>
> --
> Bernard D. Santarsiero
> Research Professor
> Center for Pharmaceutical Biotechnology and the
>  Department of Medicinal Chemistry and Pharmacognosy
> Center for Structural Biology
> Center for Clinical and Translational Science
> University of Illinois at Chicago
> MC870  3070MBRB  900 South Ashland Avenue
> Chicago, IL 60607-7173  USA
> (312) 413-0339 (office)
> (312) 413-9303 (FAX)
> http://www.uic.edu/labs/bds
>
>

Re: [ccp4bb] problem of conventions

Reply via email to