[ccp4bb] Crystallographic group leader position
Hi Crystallographers, The Experimental Therapeutice Centre (Agency for Science, Technology and Research, Singapore) is hiring a senior Research Fellow to lead a small team of structural biologist. The successful candidate will lead-from-the-bench all aspects of X-ray crystallography, from protein expression to co-crystal trials to data collection and visualization. Kindly refer to https://astar.aqayo.com/site-YXN0YXJ8MjA/member_offerdetail.jsp?siteid=YXN0YXJ8MjArequisitionuid=d6dc8ac9dfef0d05ccb2c08d8d1029c9781fec8e for more details. Applicant should send their CV to Jeffrey Hill jh...@etc.a-star.edu.sg or apply directly through the website. This position is suitable for crystallographers with postdoc experience and would like to step up. Best regards, Yvonne
Re: [ccp4bb] PyMOL v. Coot map 'level'
Thomas, I tried to figure out the PyMOL vs. Coot normalization discrepancy a while ago. As far as I remember, PyMOL normalizes on the raw data array, while Coot normalizes across the unit cell. So if the data doesn't exactly cover the cell, the results might be different. I posted the same question to the Coot mailing list (the thread can be found here: https://goo.gl/YjVtTu) , and got the following reply from Paul Emsley; I highlight the questions that I think you could best answer, with '***': [ ...] I suspect that the issue is related to different answers to the rmsd of what? In Coot, we use all the grid points in the asymmetric unit - other programs make a selection of grid points around the protein (and therefore have less solvent). More solvent means lower rmsd. If one then contours in n-rmsd levels, then absolute level used in Coot will be lower - and thus seem to be noisier (perhaps). I suppose that if you want comparable levels from the same map/mtz file then you should use absolute levels, not rmsd. ***What does PyMOL's 1.0 mean in electrons/A^3?*** Regards, Paul. Regards, Emily. On 01 Jun 2015, at 11:37, Emilia C. Arturo (Emily) ec...@drexel.edu wrote: One cannot understand what is going on without knowing how this map was calculated. Maps calculated by the Electron Density Server have density in units of electron/A^3 if I recall, or at least its best effort to do so. This is what I was looking for! (i.e. what the units are) Thanks. :-) Yes, I'd downloaded the 2mFo-DFc map from the EDS, and got the same Coot v. PyMOL discrepancy whether or not I turned off the PyMOL map normalization feature. If you load the same map into Pymol and ask it to normalize the density values you should set your contour level to Coot's rmsd level. If you don't normalize you should use Coot's e/A^3 level. It is quite possible that they could differ by a factor of two. This was exactly the case. The map e/A^3 level (not the rmsd level) in Coot matched very well, visually, the map 'level' in PyMOL; they were roughly off by a factor of 2. I did end up also generating a 2mFo-DFc map using phenix, which fetched the structure factors of the model in which I was interested. The result was the same (i.e. PyMOL 'level' = Coot e/A^3 level ~ = 1/2 Coot's rmsd level) whether I used the CCP4 map downloaded from the EDS, or generated from the structure factors with phenix. Thanks All. Emily. Dale Tronrud On 5/29/2015 1:15 PM, Emilia C. Arturo (Emily) wrote: Hello. I am struggling with an old question--old because I've found several discussions and wiki bits on this topic, e.g. on the PyMOL mailing list (http://sourceforge.net/p/pymol/mailman/message/26496806/ and http://www.pymolwiki.org/index.php/Display_CCP4_Maps), but the suggestions about how to fix the problem are not working for me, and I cannot figure out why. Perhaps someone here can help: I'd like to display (for beauty's sake) a selection of a model with the map about this selection. I've fetched the model from the PDB, downloaded its 2mFo-DFc CCP4 map, loaded both the map and model into both PyMOL (student version) and Coot (0.8.2-pre EL (revision 5592)), and decided that I would use PyMOL to make the figure. I notice, though, that the map 'level' in PyMOL is not equivalent to the rmsd level in Coot, even when I set normalization off in PyMOL. I expected that a 1.0 rmsd level in Coot would look identical to a 1.0 level in PyMOL, but it does not; rather, a 1.0 rmsd level in Coot looks more like a 0.5 level in PyMOL. Does anyone have insight they could share about the difference between how Coot and PyMOL loads maps? Maybe the PyMOL 'level' is not a rmsd? is there some other normalization factor in PyMOL that I should set? Or, perhaps there is a mailing list post out there that I've missed, to which you could point me. :-) Alternatively, does anyone have instructions on how to use Coot to do what I'm trying to do in PyMOL? In PyMOL I displayed the mesh of the 2Fo-Fc map, contoured at 1.0 about a 3-residue-long 'selection' like so: isomesh map, My_2Fo-Fc.map, 1.0, selection, carve=2.0, and after hiding everything but the selection, I have a nice picture ... but with a map at a level I cannot interpret in PyMOL relative to Coot :-/ Regards, Emily. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32) iEYEARECAAYFAlVo1L4ACgkQU5C0gGfAG10YkwCfROYPVXBK/pDS4z/zi5MNY1D+ nHIAnjOFiAkb6JbuIGWRWkBFDG5Xgc2K =hrPT -END PGP SIGNATURE- -- Thomas Holder PyMOL Principal Developer Schrödinger, Inc.
Re: [ccp4bb] How many is too many free reflections?
I'm afraid Gerard an Ian between them have left me a bit confused with conflicting statements: On 04/06/2015 15:29, Gerard Bricogne wrote: snip In order to guard the detection of putative bound fragments against the evils of model bias, it is very important to ensure that the refinement of each complex against data collected on it does not treat as free any reflections that were part of the working set in the refinement of the apo structure. snip On 04/06/2015 17:34, Ian Tickle wrote: snip So I suspect that most of our efforts in maintaining common free R flags are for nothing; however it saves arguments with referees when it comes to publication! snip I also remember conversations and even BB threads that made me conclude that it did NOT matter to have the same Rfree set for independent datasets (e.g. different crystals). I confess I don't remember the arguments, only the relief at not having to bother with all the bookkeeping faff Gerard outlines and Ian describes. So: could someone explain in detail why this matters (or why not), and is there a URL to the evidence (paper or anything else) in either direction? (As far as I remember, the argument went that identical free sets were unnecessary even for exactly isomorphous crystals. Something like this: model bias is not a big deal when the model has largely converged, and that's what you have for molecular substitution (as Jim Pflugrath calls it). In addition, even a weakly binding fragment compounds produces intensity perturbations large enough to make model bias irrelevant.) phx
Re: [ccp4bb] How many is too many free reflections?
It seems to me that the how many is too many aspect of this question, and the various culinary procedures that have been proposed as answers, may have obscured another, much more fundamental issue, namely: is it really the business of the data processing package to assign FreeR flags? I would argue that it isn't. (...) Excellent point! I can't agree more. Pavel
Re: [ccp4bb] How many is too many free reflections?
In other words, the free set for each complex must be such that reflections that are also present in the apo dataset retain the FreeR flag they had in that dataset. A very easy way to achieve this- generate a complete dataset to ridiculously high resolution with the cell of your crystal, and assign free-r flags. (If the first structure has been already solved, merge it's free set and extend to the new reflections) Now for every new structure solved, discard any free set that the data reduction program may have generated and merge with the complete set, discarding reflection with no Fobs (MNF) or with SigF=0. In fact, if we consider a dataset is just a 3-dimensional array, or some subset of it enclosing the reciprocal space asymmetric unit, I don't see any reason we couldn't assign one universal P1 free-R set and use it for every structure in whatever space group. By taking each new dataset, merging with the universal Free-R, and discarding those reflections not present in the new data, you would obtain a random set for your structure. There could be nested (concentric?) free-R sets with 10%, 5%, 2%, 1% free so that if you start out excluding 5% for a low-res structure then get a high resolution dataset and want to exclude 2%, you could be sure that all the 2% free reflections were also free in your previous 5% set. Thin or thick shells could be predefined. There may be problems when it is desired to exclude reflections according to some twin law or NCS. (just now read Nick Keep's post which expresses some similar ideas) eab On 06/04/2015 10:29 AM, Gerard Bricogne wrote: Dear Graeme and other contributors to this thread, It seems to me that the how many is too many aspect of this question, and the various culinary procedures that have been proposed as answers, may have obscured another, much more fundamental issue, namely: is it really the business of the data processing package to assign FreeR flags? I would argue that it isn't. From the statistical viewpoint that justifies the need for FreeR flags, these are pre-refinement entities rather than post-processing ones. If one considers a single instance of going from a dataset to a refined structure, then this distinction may seem artificial. Consider, instead, the case of high-throughput screening to detect fragment binding on a large number of crystals of complexes between a given target protein (the apo) and a multitude of small, weakly-binding fragments into solutions of which crystals of the apo have been soaked. The model for the apo crystal structure comes from a refinement against a dataset, using a certain set of FreeR flags. In order to guard the detection of putative bound fragments against the evils of model bias, it is very important to ensure that the refinement of each complex against data collected on it does not treat as free any reflections that were part of the working set in the refinement of the apo structure. In other words, the free set for each complex must be such that reflections that are also present in the apo dataset retain the FreeR flag they had in that dataset. Any mixup, in the FreeR flags for a complex, of the work vs. free status of the reflections also in the apo would push Rwork up and Rfree down, invalidating their role as indicators of quality of fit or of incipient overfitting. Great care must therefore be exercised, in the form of adequate book-keeping and procedures for generating the FreeR flags in the mtz file for each complex from that for the apo, to properly enforce this inheritance of work vs. free status. In such a context there is a clear and crucial difference between a post-processing entity and a pre-refinement one. FreeR flags belong to the latter category. In fact, the creation of FreeR flags at the end of the processing step can create a false perception, among people doing ligand screening under pressure, that they cannot re-use the FreeR flag information of the apo in refining their complexes, simply because a new set has been created for each of them. This is clearly to be avoided. Preserving the FreeR flags of the reflections that were used in the refinement of the apo structure is one of the explicit recommendations explicitly in the 2013 paper by Pozharski et al. (Acta Cryst. D69, 150-167) - see section 1.1.3, p.152. Best practice in this area may therefore not be only a question of numbers, but also of doing the appropriate thing in the appropriate place. There are of course corner cases where e.g. substantial unit-cell changes start to introduce some cross-talk between working and free reflections, but the possibililty of such complications is no argument to justify giving up on doing the right thing when the right thing can be done. With best wishes, Gerard. -- On Thu, Jun 04, 2015 at 08:30:57AM +, Graeme Winter wrote: Hi Folks, Many thanks for all of your comments - in keeping with the spirit of the BB I have digested
Re: [ccp4bb] How many is too many free reflections?
Many good points have been made on this thread so far, but mostly addressing the question how many free reflections is enough, whereas the original question was how many is too many. I suppose a reasonable definition of too many is when the error introduced into the map by leaving out all those reflections start to become a problem. It is easy to calculate this error: it is simply the difference between the map made using all reflections (regardless of Free-R flag) and the map made with 5% of the reflections left out. Of course, this difference map is identical to a map calculated using only the 5% free reflections, setting all others to zero. The RMS variation of this error map is actually independent of the phases used (Parseval's theorem), and it ends up being: RMSerror = RMSall * sqrt( free_frac ) where: RMSerror is the RMS variation of the error map RMSall is the RMS variation of the map calculated with all reflections free_frac is the fraction of hkls left out of the calculation. So, with 5% free reflections, the errors induced in the electron density will have an RMS variation that is 22.3% of the full map's RMS variation, or 0.223 sigma units. 1% free reflections will result in RMS 10% error, or 0.1 sigmas. This means, for example, that with 5% free reflections a 1.0 sigma peak might come up as a 1.2 or 0.8 sigma feature. Note that these are not the sigmas of the Fo-Fc map, (which changes as you build) but rather the sigma of the Fo map. Most of us don't look at Fo maps, but rather 2Fo-Fc or 2mFo-DFc maps, with or without the missing reflections filled in. These are a bit different from a straight Fo map. The absolute electron number density (e-/A^3) of the 1 sigma contour for all these maps is about the same, but no doubt the fill in, extra Fo-Fc term, and the likelihood weights reduces the overall RMS error. By how much? That is a good question. Still, we can take this RMS 0.223 sigma variation from 5% free reflections as a worst-case scenario, and then ask the question: is this a problem? Well, any source of error can be a problem, but when you are trying to find the best compromise between two difficult-to-reconcile considerations (such as the stability of Rfree and the interpretability of the map), it is usually helpful to bring in a third consideration: such as how much noise is in the map already due to other sources? My colleagues and I measured this recently (doi: 10.1073/pnas.1302823110), and found that the 1-sigma contour ranges from 0.8 to 1.2 e-/A^3 (relative to vacuum), experimental measurement errors are RMS ~0.04 e-/A^3 and map errors from the model-data difference is about RMS 0.13 e-/A^3. So, 22.3% of sigma is around RMS 0.22 e-/A^3. This is a bit larger than our biggest empirically-measured error: the modelling error, indicating that 5% free flags may indeed be too much. However, 22.3% is the worst-case error, in the absence of all the corrections used to make 2mFo-DFc maps, so in reality the modelling error and the omitted-reflection errors are probably comparable, indicating that 5% is about the right amount. Any more and the error from omitted reflections starts to dominate the total error. On the other hand, the modelling error is (by definition) the Fo-Fc difference, so as Rwork/Rfree get smaller the RMS map variation due to modelling errors gets smaller as well, eventually exposing the omitted-reflection error. So, once your Rwork/Rfree get to be less than ~22%, the errors in the map are starting to be dominated by the missing Fs of the 5% free set. However, early in the refinement, when your R factors are in the 30%s, 40%s, or even 50%s, I don't think the errors due to missing 5% of the reflections are going to be important. Then again, late in refinement, it might be a good idea to start including some or all of the free reflections back into the working set in order to reduce the overall map error (cue lamentations from validation experts such as Jane Richardson). This is perhaps the most important topic on this thread. There are so many ways to contaminate, bias or otherwise compromise the free set, and once done we don't have generally accepted procedures for re-sanctifying the free reflections, other that starting over again from scratch. This is especially problematic if your starting structure for molecular replacement was refined against all reflections, and your ligand soak is nice and isomorphous to those original crystals. How do you remove the evil bias from this model? You can try shaking it, but that only really removes bias at high spatial frequencies and is not so effective at low resolution. So, if bias is so easy to generate why not use it to our advantage? Instead of leaving the free-flagged reflections out of the refinement, put them in, but give them random F values. Then do everything you can to bias your model toward these random values. Loosen the
Re: [ccp4bb] How many is too many free reflections?
Nick What you describe is (almost) exactly the way we have always done it at Astex I'm surprised to hear that others are not routinely doing the same. The difference is that we don't generate a free R flag MTZ file to ultra-high resolution as you suggest, since there's never any need to. What we do is generate by default a 1.5 Ang. free R flag file using UNIQUE, FREERFLAG and MTZUTILS whenever a new apo structure for a given target/crystal form is solved and keep that with the intial apo data as a reference dataset for auto-re-indexing (so that all the protein-ligand datasets are indexed the same way). When a dataset is combined with the higher resolution free R flag file we would of course cut the resolution to that of the data (still keeping the original free R flag file), mainly in order to save space in the database. Obviously if the initial apo data were higher resolution than 1.5 Ang, the processing script would generate an initial free R flag file also correspondlingly higher (say to 1 Ang.). If a ligand dataset comes along later at higher resolution than 1.5 Ang. the script would do the same thing, but then it would use the MTZUTILS UNIQ option to merge the old free R flags up to 1.5 Ang. with the new ones between 1.5 and 1 Ang. Then it would combine the data file with the free R flag file as before and cut the resolution of the combined data file to the actual resolution of the data. The script would then replace the old free R flag file with the new one and use the latter for all subsequent datasets from that target/crystal form. The users are completely unaware that any of this is happening (unless they want to dig into the scripts!). We enforce use of 'approved' scripts for all the processing and refinement essentially by using an Oracle database with web-based access authentication which means that if you don't use the approved scripts to process your data then you can't upload your data to the database, which then means that no-one else will get to see and/or use your results! Our scripts make full use of CCP4 and Global Phasing programs (autoPROC, autoBUSTER, GRADE etc): however using CCP4i or other programs from the command line to process the data and only uploading the final results to the database is severely deprecated (and totally unsupported!), mainly because there will then be no permanent traceback in the database of the user's actions for others to see. On Gerard's final point of the effect on non-isomorphism, we find that isomorphism is the exception rather than the rule, i.e. the majority of our datasets would fail the Crick-Magdoff test for isomorphism (i.e. no more than 0.5% change for all cell lengths for 3 Ang. data and a correspondingly lower threshold at more typical resolution limits of 2 - 1.5 Ang.). This is obviously very target and crystal form-dependent, some targets/crystal forms give more isomorphous crystals than others. So I suspect that most of our efforts in maintaining common free R flags are for nothing; however it saves arguments with referees when it comes to publication! Cheers -- Ian On 4 June 2015 at 16:00, Nicholas Keep n.k...@mail.cryst.bbk.ac.uk wrote: I agree with Gerard. It would be much better in many ways to generate a separate file of Free R flags for each crystal form of a project to some high resolution that is unlikely to ever be exceeded eg 0.4 A that is a separate input file to refinement rather than in the mtz. The generation of this free set could ask some questions like is the data twinned, do you want to extend the free set from a higher symmetry free set. eg C2 rather than C2221 (symmetry is close to the higher symmetry but not perfect- seems to happen not infrequently). Could some judicious selection of sets of related potentially related hkls work as a universal free set? (Not thought this through fully) This would get around practical issues like I had yestserday in refining in another well known package where coot drew the map as if it was 0.5 A data even though there were only observed data to 2.1 the rest just being a hopelessly overoptimistic guess of the best ever dataset we might collect. I agree you CAN do this with current software- it is just not the path of least resistance, so you have to double check your group are doing this. Best wishes Nick -- Prof Nicholas H. Keep Executive Dean of School of Science Professor of Biomolecular Science Crystallography, Institute for Structural and Molecular Biology, Department of Biological Sciences Birkbeck, University of London, Malet Street, Bloomsbury LONDON WC1E 7HX email n.k...@mail.cryst.bbk.ac.uk Telephone 020-7631-6852 (Room G54a Office) 020-7631-6800 (Department Office) Fax 020-7631-6803 If you want to access me in person you have to come to the crystallography entrance and ring me or the department office from the internal phone by the door
[ccp4bb] why are the bond lengths all different between D and L amino acids?
We have been trying to deposit our peptide structures with D amino acids in them They are 15 mers all D, with and L racemate, refined in Refmac 5.8.0107 When we run validation, all the D amino acids have bond length outliers, while none of the L do. Example from the validation server: A bond length (or angle) with jZj 2 is considered an outlier worth inspection. Mol Type Chain Res Link Bond lengths Bond angles Counts RMSZ #jZj 2 Counts RMSZ #jZj 2 1 DLY A 10 - 8,?,? 0.47 0 8,?,? 1.74 2 (25%) 1 DLY A 11 - 4,?,? 1.38 1 (25%) 4,?,? 1.51 1 (25%) 1 DPN A 12 - 11,?,? 1.14 1 (9%) 13,?,? 1.29 2 (15%) 1 DAL A 13 - 4,?,? 0.83 0 4,?,? 2.11 2 (50%) 1 DLY A 14 - 4,?,? 1.19 1 (25%) 4,?,? 2.43 1 (25%) 1 DAL A 15 - 4,?,? 1.55 1 (25%) 4,?,? 2.27 1 (25%) 1 DPN A 16 - 11,?,? 1.97 1 (9%) 13,?,? 1.10 1 (7%) 1 DVA A 17 - 6,?,? 1.03 1 (16%) 7,?,? 1.03 1 (14%) etc. None of the L's have this. I also downloaded the lib file for D and L Leucine and the bond lengths are different: Leu _chem_comp_bond.value_dist _chem_comp_bond.value_dist_esd LEU N CAsingle 1.4910.021 LEU CA HAsingle 0.9800.020 LEU CA CBsingle 1.5300.020 LEU CB HB3 single 0.9700.020 LEU CB HB2 single 0.9700.020 LEU CB CGsingle 1.5300.020 LEU CG HGsingle 0.9700.020 LEU CG CD1 single 1.5210.020 LEU CD1HD11 single 0.9600.020 LEU CD1HD12 single 0.9600.020 LEU CD1HD13 single 0.9600.020 LEU CG CD2 single 1.5210.020 LEU CD2HD21 single 0.9600.020 LEU CD2HD22 single 0.9600.020 LEU CD2HD23 single 0.9600.020 LEU CA C single 1.5250.021 LEU C O deloc 1.2310.020 LEU N H1single 0.9600.020 LEU N H2single 0.9600.020 LEU N H3single 0.9600.020 LEU C OXT deloc 1.2310.020 and for DLE _chem_comp_bond.value_dist _chem_comp_bond.value_dist_esd DLe N CAsingle 1.4550.020 DLe CB CAsingle 1.5240.020 DLe CA C single 1.5000.020 DLe CG CBsingle 1.5240.020 DLe CD1CGsingle 1.5240.020 DLe CD2CGsingle 1.5240.020 DLe O C deloc 1.2500.020 DLe C OXT deloc 1.2500.020 DLe HN N single 0.9540.020 DLe HA CAsingle 1.0990.020 DLe HB1CBsingle 1.0920.020 DLe HB2CBsingle 1.0920.020 DLe HG CGsingle 1.0990.020 DLe HD21 CD2 single 1.0590.020 DLe HD22 CD2 single 1.0590.020 DLe HD23 CD2 single 1.0590.020 DLe HD11 CD1 single 1.0590.020 DLe HD12 CD1 single 1.0590.020 DLe HD13 CD1 single 1.0590.020 Has anyone seen this? and how do I refine if the restraints are different for the enantiomorphs? I doubt ALL the bl's should be different. Thanks for your help. Kenneth A. Satyshur, M.S., Ph.D. Senior Scientist University of Wisconsin-Madison Madison, Wisconsin, 53706 608-215-5207
Re: [ccp4bb] Off-topic: Request for DNA
There is a group at Regensburg University in Germany that has worked with this organism. That is where we got our DNA from. Kim On Tue, May 26, 2015 at 2:54 PM, Mohamed Noor mohamed.n...@staffmail.ul.ie wrote: Dear all I am looking for a small amount of Aquifex aeolicus DNA or cell pellet for PCR. Unfortunately neither ATCC nor DSMZ holds this bacterium. Thanks. -- Kimberly Stanek Graduate Student Mura Lab Department of Chemistry University of Virginia (434) 924-7979
Re: [ccp4bb] How many is too many free reflections?
Dear Graeme and other contributors to this thread, It seems to me that the how many is too many aspect of this question, and the various culinary procedures that have been proposed as answers, may have obscured another, much more fundamental issue, namely: is it really the business of the data processing package to assign FreeR flags? I would argue that it isn't. From the statistical viewpoint that justifies the need for FreeR flags, these are pre-refinement entities rather than post-processing ones. If one considers a single instance of going from a dataset to a refined structure, then this distinction may seem artificial. Consider, instead, the case of high-throughput screening to detect fragment binding on a large number of crystals of complexes between a given target protein (the apo) and a multitude of small, weakly-binding fragments into solutions of which crystals of the apo have been soaked. The model for the apo crystal structure comes from a refinement against a dataset, using a certain set of FreeR flags. In order to guard the detection of putative bound fragments against the evils of model bias, it is very important to ensure that the refinement of each complex against data collected on it does not treat as free any reflections that were part of the working set in the refinement of the apo structure. In other words, the free set for each complex must be such that reflections that are also present in the apo dataset retain the FreeR flag they had in that dataset. Any mixup, in the FreeR flags for a complex, of the work vs. free status of the reflections also in the apo would push Rwork up and Rfree down, invalidating their role as indicators of quality of fit or of incipient overfitting. Great care must therefore be exercised, in the form of adequate book-keeping and procedures for generating the FreeR flags in the mtz file for each complex from that for the apo, to properly enforce this inheritance of work vs. free status. In such a context there is a clear and crucial difference between a post-processing entity and a pre-refinement one. FreeR flags belong to the latter category. In fact, the creation of FreeR flags at the end of the processing step can create a false perception, among people doing ligand screening under pressure, that they cannot re-use the FreeR flag information of the apo in refining their complexes, simply because a new set has been created for each of them. This is clearly to be avoided. Preserving the FreeR flags of the reflections that were used in the refinement of the apo structure is one of the explicit recommendations explicitly in the 2013 paper by Pozharski et al. (Acta Cryst. D69, 150-167) - see section 1.1.3, p.152. Best practice in this area may therefore not be only a question of numbers, but also of doing the appropriate thing in the appropriate place. There are of course corner cases where e.g. substantial unit-cell changes start to introduce some cross-talk between working and free reflections, but the possibililty of such complications is no argument to justify giving up on doing the right thing when the right thing can be done. With best wishes, Gerard. -- On Thu, Jun 04, 2015 at 08:30:57AM +, Graeme Winter wrote: Hi Folks, Many thanks for all of your comments - in keeping with the spirit of the BB I have digested the responses below. Interestingly I suspect that the responses to this question indicate the very wide range of resolution limits of the data people work with! Best wishes Graeme === Proposal 1: 10% reflections, max 2000 Proposal 2: from wiki: http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set including Randy Read recipe: So here's the recipe I would use, for what it's worth: 1 reflections:set aside 10% 1-2 reflections: set aside 1000 reflections 2-4 reflections: set aside 5% 4 reflections:set aside 2000 reflections Proposal 3: 5% maximum 2-5k Proposal 4: 3% minimum 1000 Proposal 5: 5-10% of reflections, minimum 1000 Proposal 6: 50 reflections per bin in order to get reliable ML parameter estimation, ideally around 150 / bin. Proposal 7: If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be 40k i.e. rather a lot. Referees question use of 5k reflections as test set. Comment 1 in response to this: Surely absolute # of test reflections is not relevant, percentage is. Approximate consensus (i.e. what I will look at doing in xia2) - probably follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy most of the criteria raised by everyone else. On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter graeme.win...@gmail.com wrote: Hi Folks Had a vague comment handed my way that xia2 assigns too many free reflections - I have a
Re: [ccp4bb] How many is too many free reflections?
I agree with Gerard. It would be much better in many ways to generate a separate file of Free R flags for each crystal form of a project to some high resolution that is unlikely to ever be exceeded eg 0.4 A that is a separate input file to refinement rather than in the mtz. The generation of this free set could ask some questions like is the data twinned, do you want to extend the free set from a higher symmetry free set. eg C2 rather than C2221 (symmetry is close to the higher symmetry but not perfect- seems to happen not infrequently). Could some judicious selection of sets of related potentially related hkls work as a universal free set? (Not thought this through fully) This would get around practical issues like I had yestserday in refining in another well known package where coot drew the map as if it was 0.5 A data even though there were only observed data to 2.1 the rest just being a hopelessly overoptimistic guess of the best ever dataset we might collect. I agree you CAN do this with current software- it is just not the path of least resistance, so you have to double check your group are doing this. Best wishes Nick -- Prof Nicholas H. Keep Executive Dean of School of Science Professor of Biomolecular Science Crystallography, Institute for Structural and Molecular Biology, Department of Biological Sciences Birkbeck, University of London, Malet Street, Bloomsbury LONDON WC1E 7HX email n.k...@mail.cryst.bbk.ac.uk Telephone 020-7631-6852 (Room G54a Office) 020-7631-6800 (Department Office) Fax 020-7631-6803 If you want to access me in person you have to come to the crystallography entrance and ring me or the department office from the internal phone by the door
Re: [ccp4bb] How many is too many free reflections?
Hi Folks, Many thanks for all of your comments - in keeping with the spirit of the BB I have digested the responses below. Interestingly I suspect that the responses to this question indicate the very wide range of resolution limits of the data people work with! Best wishes Graeme === Proposal 1: 10% reflections, max 2000 Proposal 2: from wiki: http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set including Randy Read recipe: So here's the recipe I would use, for what it's worth: 1 reflections:set aside 10% 1-2 reflections: set aside 1000 reflections 2-4 reflections: set aside 5% 4 reflections:set aside 2000 reflections Proposal 3: 5% maximum 2-5k Proposal 4: 3% minimum 1000 Proposal 5: 5-10% of reflections, minimum 1000 Proposal 6: 50 reflections per bin in order to get reliable ML parameter estimation, ideally around 150 / bin. Proposal 7: If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be 40k i.e. rather a lot. Referees question use of 5k reflections as test set. Comment 1 in response to this: Surely absolute # of test reflections is not relevant, percentage is. Approximate consensus (i.e. what I will look at doing in xia2) - probably follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy most of the criteria raised by everyone else. On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter graeme.win...@gmail.com wrote: Hi Folks Had a vague comment handed my way that xia2 assigns too many free reflections - I have a feeling that by default it makes a free set of 5% which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems excessive now. This was particularly in the case of high resolution data where you have a lot of reflections, so 5% could be several thousand which would be more than you need to just check Rfree seems OK. Since I really don't know what is the right # reflections to assign to a free set thought I would ask here - what do you think? Essentially I need to assign a minimum %age or minimum # - the lower of the two presumably? Any comments welcome! Thanks best wishes Graeme