Bernhard, > Well, "it" *IS* broke.
As they say "it works for me", so either you're using a different set of programs from me, or you're using the same programs but in a different way. Perhaps you could be more specific as to which program(s) appear to be broken? If possible please post the logfile(s) on this forum, then someone might recognise the problem(s). Did you try reporting it to CCP4 (assuming of course we're talking about CCP4 programs)? You're the 2nd person in this thread to claim that the space-group handling for the alternate settings is broken, so it would be nice to get to the bottom of it! > If you are running some type of process, as you > implied in referring to LIMS, then there is a step in which you move from > the crystal system and point group to the actual space group. So, at that > point you identify P22121. The next clear step, automatically by software, > is to convert to P21212, and move on. That doesn't take an enormous amount > of code writing, and you have a clear trail on how you got there. I'm puzzled why I need a workaround for a bug that only you and possibly James have experienced: AFAIK no-one else has reported problems with this recently. Wouldn't it be make more sense to fix the bug(s)? - that way, everyone benefits and I don't need to do anything! Anyway, to respond to your suggestion: I've spent some time looking into this (so I hope you'll forgive the delay in replying!), and unfortunately it's not as simple as you think. I can see 3 main steps that would be required for a workaround: Step 1 (create new crystal form entry): First I would have to make a copy of the entry for the old crystal form in the PROTEINS table, giving it a new unique ID. Then I would perform the re-indexing/re-orientation operations on the reference & free-R MTZ files and the PDB file for the refined structure, and change the filename entries in the row of PROTEINS table just created to point to them. This row also contains the parameters for MR, rigid-body refinement, TLS and binding site definitions but these won't need to be changed. The user interface would need to be modified to give users the option of implementing this change, since I know some (most?) users who won't be happy to do it! One problem I foresee is confusing the users with a multiplicity of unit cells, since we already work with potentially 2 different cells per crystal: first the 'canonical' unit cell for the crystal form from the reference MTZ file header; then there's the unit cell for the isomorphous crystal as found by the indexing software. Users understand that the indexing program won't necessarily choose the reference cell, particularly in the situation you indicate below where 2 cell lengths are almost equal. Now you want me to add a 3rd possibly different unit cell, i.e. that after a second run of re-indexing to the 'standard setting'; the users won't understand the need for this. Next comes a tricky bit: for tracking purposes I would somehow need to make a link from the new crystal form to the old one, my guess is with a self-referencing foreign key. All the database applications for doing searches & reports would need to be modified to recognise this change. This doesn't look trivial to me! I would need to hand this task over to the database administrator & programmers, since I'm not involved with administration of the database. Getting a "clear trail" doesn't happen automatically, it has to be programmed! I anticipate some searching questions from all the users and the db admin, such as "why do we need to do this?", "what bad things will happen if we don't?" and "why haven't we seen these bad things happening before?". I'm hoping that you will be able to provide convincing answers to these questions - because I can't! Step 2 (re-index historical data): Then I would need to copy each entry for the historical datasets that were previously added to the database for the old crystal form to the new crystal form (of course it's actually _same_ crystal form, but we're fooling the LIMS into treating it as though it were a new one). This is so that we can continue to track the data using the new crystal form ID. All datasets for a given crystal form must be indexed in the same way since the LIMS interface allows you to mix & match PDB, MTZ & MAP files for the crystal form without the need to do superpositions (of course superpositions can be done if needed, but then you lose the symmetry info). These 'historical' datasets are all the ones generated in the process of getting and optimising the crystal form, i.e. from all the different constructs made (typically ~ 30 +- 20), the purifications and crystallisation trials, optimising the cryobuffer & DMSO concentration for soaking ligands, then the datasets used during the structure determination (MR/MAD/SAD etc). This may run to 100-150 datasets, but the actual number is immaterial since it's just as easy to write the database application for many as for one. So a database application would have to be developed for this: this is not a straightforward as you seem to think. First the easy bit: I would re-index the processed MTZ file for each of these historical datasets. Then I would copy each corresponding row in the CRYSTALS table, changing the filename to the re-indexed MTZ file and permuting the cell lengths. For the re-indexing I use the CCP4 REFINDEX program that I wrote for this purpose: this automatically re-indexes a dataset to maximise the correlation of the F^2 between the 'reference' dataset and the new one. I know I could also use POINTLESS for this, but I wrote REFINDEX while we were developing the LIMS software from 1999-2001, since nothing equivalent was available at the time (POINTLESS was not developed until several years later). The primary key which identifies the new dataset would obviously need to be updated (it has to be unique) - we can't simply change the old primary key because it's used as a foreign key in several other tables: for example the crystal ID is referenced, among others, in the tables which contain info related to transport of the crystals to the synchrotron and/or the mounting robot assignment (i.e. cane/puck positions). This row in the table also contains statistics (Rmerges etc) and other info (e.g. mosaicity, phi range & step, image file location) extracted from the processing logfiles, but again this info can be simply copied across. I would need to add an entry in the JOBS table (which will also contain links to the old crystal IDs), to record the fact I had done all this (i.e. foreign key to new crystal ID, user ID, date/time, protocol name/version, command line, completion status). For convenience, again in the CRYSTALS table I would need to link the old data entries to the new ones, so we have a record of what was done, and to deal with the fact that there would still be foreign keys in several other tables pointing at all the _old_ crystal ID primary keys. So again we would need to add a self-referencing foreign key to the CRYSTALS table, and ensure the upgraded database applications can also recognise this new column, again not trivial. Step 3 (process new data): Finally, for each new dataset I would process the data in the normal way, pretty well as you suggest. However I already foresee another problem here: the crystal form is currently recognised from the lattice type (P, I, C etc), the point group and the unit cell volume; the software chooses the unit cell with the same lattice type & point group which has the closest volume (+- 20%) to the reference cell. The implicit assumption we have made in the design is that no two crystal forms for a given protein can have all 3 criteria equal at the same time (we've never seen exceptions to this). However you are now proposing to violate this assumption, since all 3 criteria will be identical for the old & new crystal forms. There's a 50% chance that the software would re-index the data into the old crystal form instead of the new one and we would have changed nothing! Clearly, we would need to add another criterion (e.g. the space group) to resolve the ambiguity. > To be even more intrusive, what if you had cell parameters of 51.100, > 51.101, and 51.102, and it's orthorhombic, P21212. For other co-crystals, > soaks, mutants, etc., you might have both experimental errors and real > differences in the unit cell, so you're telling me that you would process > according to the a < b < c rule in P222 to average and scale, and then it > might turn out to be P22121, P21221, or P21212 later on? When you wish to > compare coordinates, then you have re-assign one coordinate data to match > the other by using superposition, rather than taking on an earlier step of > just using the conventional space group of P21212? No, the cell lengths are irrelevant even if they're almost equal, since as I mentioned above, REFINDEX tries all possible re-indexings and maximises the correlation coefficient of the _F^2_. So yes the data is always processed as P222 using the a<b<c rule, but then it may be re-indexed to the reference (and the correct space group assigned from the reference MTZ header) so that (nearly) isomorphous datasets are then all indexed identically. When I advised against re-indexing earlier, I was talking about re-indexing to a 'standard setting' without a good reason: you will recall I said "isomorphism overrides convention". The PDB file is re-oriented using the inverse transposed re-indexing matrix, it's not necessary to use superposition (though it would of course give a similar result). > Again, while I see use of the a < b < c rule when there isn't an > overriding reason to assign it otherwise, as in P222 or P212121, there > *is* a reason to stick to the convention of one standard setting. That's > the rationale on using P21/n sometimes vs. P21/c, or I2 vs C2, to avoid a > large beta angle, and adopt a non-standard setting. So what is the reason to stick to one standard setting? If there's already an isomorphous structure in that setting I can see its value, but how does it help in the case there's no similar structure? Equally if say there's an isomorphous structure already in the non-standard setting it would be sensible to use that: "isomorphism always trumps convention". You say "convention of one standard setting": I wasn't aware that such a convention existed, and it certainly doesn't mention any such convention in IT (how could it, it would then be inconsistent with itself!). I would like to be reassured that a group of experts has considered the details of such a convention at length and produced readily accessible reference documentation. Can you provide a reference to such documentation for your convention? Like you, I did my doctorate in crystallography (small molecule) and I was taught that IT was the 'bible' in all matters crystallographic! You seem to want to pick the parts of the convention you like but have exceptions for those that you don't. It would be like saying that you will use the SI convention on units, with the exception that you will use feet instead of metres (so instead of 1.1 metres you would have "3 ft 20 cm"!). What's so special about the 'standard setting' anyway? In the 1935 & 1952 IT editions the 'standard setting' was chosen arbitrarily only as a _representative_ setting for illustrative purposes (from which the reader was expected to derive the other settings by permutation), and the corresponding 'standard symbol' was used as the page heading for indexing purposes. Those were their sole functions. Please take a look at IT/A: to ensure we're seeing the same info I suggest we both look at the 2006 online edition which can be found here: http://ahrenkiel.sdsmt.edu/courses/Spring_2011/NANO704/International_Tables_For_Crystallography_A.pdf On p.39 in the PDF document (p.22 on the printed page) it reads: In the earlier (1935 and 1952) editions of International Tables, only one setting was illustrated, in a projection along c, so that it was usual to consider it as the ‘standard setting’ and to accept its cell edges as crystal axes and its space-group symbol as ‘standard Hermann–Mauguin symbol’. In the present edition, however, all six orthorhombic settings are illustrated, as explained below. (there are 6 orthorhombic settings excluding all those related by simply negating 1 or more axes; however of course only 3 of these can be differently labelled). In the 2002 edition the entry for C2 covers 8 pages (pp 124-131): only one of those illustrates the 'standard setting' (i.e. the setting corresponding to the 'standard symbol' in the page heading). This use of one of the settings chosen arbitrarily had nothing to do with the choice of axes, which is a totally unrelated issue - all settings have equal status in this respect. If there hadn't been the need to save paper after WW2, all alternative settings might have been illustrated in the 1952 edition and there would have been no need for a "standard setting"! In fact later editions of IT/A tabulate _all_ settings in chapters 3 & 4 ("Space-group Determination" & "Synoptic Tables of Space-Group Symbols") and the latest editions also illustrate all the settings, so the "standard setting" concept is now largely redundant. Look up the subject index (PDF p.909) for "standard setting" (or even "setting, standard"). Actually don't, because you won't find it! If the concept of "standard setting" is as critical as you claim, wouldn't it deserve at the very least an index entry and a dedicated section explaining your "standard settings convention"? In reality it gets a few brief mentions, mainly in the historical context. IT does actually give an example of space-group assignment which is relevant to this discussion, see PDF p.60 (printed p.45) in the chapter by the late & great Martin J. Buerger (I remember "Elementary Crystallography" and "The Precession Method" with the hard grey covers well!) : The diffraction pattern of a compound has Laue class mmm. The crystal system is thus orthorhombic. The diffraction spots are indexed such that the reflection conditions are 0kl : l = 2n; h0l : h = l = 2n; h00 : h = 2n; 00l : l = 2n. Table 3.1.4.1 shows that the diffraction symbol is mmmPcn–. Possible space groups are Pcn2 (30) and Pcnm (53). For neither space group does the axial choice correspond to that of the standard setting. For No. 30, the standard symbol is Pnc2, for No. 53 it is Pmna. It doesn't say what the cell lengths are, but I would guess this is a small-molecule crystal (no prizes for guessing why!) and so the unit cell is likely to have been chosen in full knowledge of the two space-group possibilities (which differ only in a mirror plane). Yet a non-standard setting was chosen! No mention here of the vital importance to re-index to the standard setting! I suspect that most people who think that there's a "standard settings" convention believe it because that's what they were taught (and also what their teachers were taught!), or if they actually have consulted IT/A at all they will have only looked at the space-group diagrams. If they haven't taken the trouble to read the important explanatory chapters (1 through 5 and 8 through 15) which precede and follow chapter 7 (which contains the diagrams for the 3-D space groups), it's easy to see how they would have come to the mistaken conclusion concerning the significance of the "standard symbols" in the page headings! > Finally, if you think it's fine to use P22121, then can I assume that you > also allow the use of space group A2 and B2? If there's an existing isomorphous structure in the same orientation (though possibly in a different space group), then yes. As the conventional cell with no good reason to choose otherwise, then no, simply because the IUCr/NIST unit cell convention won't generate those space groups! To be clear, here's a summary of the steps involved in using the convention: 1) Choose a unit cell which has the full point symmetry of the diffraction pattern, or a supergroup thereof (e.g. for all trigonal point groups the unit cell has point symmetry 6/mmm). 2) Step 1 will generate an infinite number of cells containing different number of lattice points related by integer multiples of the lattice vectors (i.e. not counting centred cells where the lattice translations are fractions of the lattice vectors), so choose the one(s) with the least no of integer lattice points. 3) In the case that the point group has a unique axis (i.e. 2, 3. 4 or 6-fold), steps 1 & 2 will generate unit cells having different orientations of the unique axis, so choose the one(s) which have the 2 || b in monoclinic or the 3/4/6 || c in tri/tetra/hexagonal. 4) In triclinic & monoclinic steps 1, 2 & 3 will produce cells where the lattice vectors are not the shortest, so choose cell(s) having the shortest lattice vectors (e.g. this is the cell with beta nearest to 90 deg in monoclinic). 5) In triclinic & monoclinic the previous steps will still give multiple choices of cell angle(s), so in triclinic choose cell angles either all <= 90 or all >= 90; in monoclinic choose beta >= 90. 6) In centred monoclinic space groups, the previous steps will generate A-, C- and I-centred cells. Eliminate the A (there will always be an equivalent C cell since you can swap a and c). 7) In the R-centred hexagonal cell, there will remain an ambiguity between the 'obverse' & 'reverse' settings: choose the obverse. 8) Finally if an ambiguity in the orientation of the cell still remains (in triclinic, primitive & I-centred monoclinic and all orthorhombic), apply the rule a <= b <= c. This procedure will generate a unique unit cell; note that A centring is eliminated at step 6, B centring with unique axis b is eliminated at step 2 because it contains 2 integer lattice points, and B centring with unique axis a or c is eliminated at step 3. Note that the space group doesn't figure at all in this decision, for the simple reason that in most cases it cannot be reliably deduced at the point in the process where the unit cell is chosen (i.e. the indexing step). To choose the unit cell you don't need to have measured any intensities (how can you if you haven't indexed the spots yet?), whereas to select the space group unambiguously, in most cases you need to have measured the intensities first. -- Ian