Re: [ccp4bb] x-ray diffraction data analysis (XDS)
The HKL2000 stats look very untypical to me. I wonder what the completeness is. For the XDS processing, have you tried the hints given in the Optimization article in XDSwiki, in particular recycling of GXPARM.XDS and using the *_E.S.D. parameters as found in INTEGRATE.LP ? According to your current CORRECT.LP, these data may be useful to 3.7A at most. Good luck, Kay
Re: [ccp4bb] x-ray diffraction data analysis (XDS)
I forgot: you should be using FRIEDEL'S_LAW=TRUE
Re: [ccp4bb] Resolution, R factors and data quality
A bit late to this thread. 1. Juergen: Jim was not actually adopting CC*, he was asking how to make practical use of it when faced with actual datasets fading into noise. If I understand correctly from later responses, paired refinement is what KD suggest should be best practice? 2. I'm struck by how small the improvements in R/Rfree are in Diederichs Karplus (ActaD 2013,http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/); the authors don't discuss it, but what's current thinking on how to estimate the expected variation in R/Rfree - does the Tickle formalism (1998) still apply for ML with very weak data? I'm puzzled by Table 4 (and discussion): do I read correctly that discarding negative unique reflections led to higher CCwork/CCfree? Wasn't the point of the paper that massaging data always shows up in worse refinement stats? Is this a corner case, and how would one know? Cheers phx On 28/08/2013 01:48, Bosch, Juergen wrote: Hi Jim, all data is good data - the more data you have the better (that's what they say anyhow) Not everybody is adopting to the Karplus Diederich paper as quickly as you do. And not to be confused with the Diederichs and Karplus paper :-) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/ http://www.ncbi.nlm.nih.gov/pubmed/22628654 My models get better by including the data I had been omitting before, that's all that counts for me. Jürgen P.S. reminds me somehow of those guys collecting more and more data - PRISM greetings On Aug 27, 2013, at 8:29 PM, Jim Pflugrath wrote: I have to ask flamingly: So what about CC1/2 and CC*? Did we not replace an arbitrary resolution cut-off based on a value of Rmerge with an arbitrary resolution cut-off based on a value of Rmeas already? And now we are going to replace that with an arbitrary resolution cut-off based on a value of CC* or is it CC1/2? I am asked often: What value of CC1/2 should I cut my resolution at? What should I tell my students? I've got a course coming up and I am sure they will ask me again. Jim *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of Arka Chakraborty [arko.chakrabort...@gmail.com mailto:arko.chakrabort...@gmail.com] *Sent:* Tuesday, August 27, 2013 7:45 AM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Subject:* Re: [ccp4bb] Resolution, R factors and data quality Hi all, does this not again bring up the still prevailing adherence to R factors and not a shift to correlation coefficients ( CC1/2 and CC*) ? (as Dr. Phil Evans has indicated).? The way we look at data quality ( by we I mean the end users ) needs to be altered, I guess. best, Arka Chakraborty On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk mailto:p...@mrc-lmb.cam.ac.uk wrote: The question you should ask yourself is why would omitting data improve my model? Phil .. Jürgen Bosch Johns Hopkins University Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Office: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-2926 http://lupo.jhsph.edu
Re: [ccp4bb] Resolution, R factors and data quality
On 1 September 2013 11:31, Frank von Delft frank.vonde...@sgc.ox.ac.ukwrote: 2. I'm struck by how small the improvements in R/Rfree are in Diederichs Karplus (ActaD 2013, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/); the authors don't discuss it, but what's current thinking on how to estimate the expected variation in R/Rfree - does the Tickle formalism (1998) still apply for ML with very weak data? Frank, our paper is still relevant, unfortunately just not to the question you're trying to answer! We were trying to answer 2 questions: 1) what value of Rfree would you expect to get if the structure were free of systematic error and only random errors were present, so that could be used as a baseline (assuming a fixed cross-validation test set) to identify models with gross (e.g. chain-tracing) errors; and 2) how much would you expect Rfree to vary assuming a fixed starting model but with a different random sampling of the test set (i.e. the sampling standard deviation). The latter is relevant if say you want to compare the same structure (at the same resolution obviously) done independently in 2 labs, since it tells you how big the difference in Rfree for an arbitrary choice of test set needs to be before you can claim that it's statistically significant. In this case the questions are different because you're certainly not comparing different models using the same test set, neither I suspect are you comparing the same model with different randomly selected test sets. I assume in this case that the test sets for different resolution cut-offs are highly correlated, which I suspect makes it quite difficult to say what is a significant difference in Rfree (I have not attempted to do the algebra!). Rfree is one of a number of model selection criteria (see http://en.wikipedia.org/wiki/Model_selection#Criteria_for_model_selection) whose purpose is to provide a metric for comparison of different models given specific data, i.e. as for the likelihood function they all take the form f(model | data), so in all cases you're varying the model with fixed data. It's use in the form f(data | model), i.e. where you're varying the data with a fixed model I would say is somewhat questionable and certainly requires careful analysis to determine whether the results are statistically significant. Even assuming we can argue our way around the inappropriate application of model selection methodology to a different problem, unfortunately Rfree is far from an ideal criterion in this respect; a better one would surely be the free log-likelihood as originally proposed by Gerard Bricogne. Cheers -- Ian
Re: [ccp4bb] Dependency of theta on n/d in Bragg's law
Perhaps some of the confusion here arises because Bragg's Law is not a Fourier transform. Remember, in the standard diagram of Bragg's Law, there are only two atoms that are d apart. The full diffraction pattern from just two atoms actually looks like this: http://bl831.als.lbl.gov/~jamesh/nearBragg/intimage_twoatom.img This is an ADSC format image, so you can look at it in your favorite diffraction image viewer, such as ADXV, imosflm, HKL2000, XDSviewer, ipdisp, fit2d, whatever you like. Or, you can substitute png for img in the filename and look at it in your web browser. Notice how there are 9 bands for only 2 atoms? If you look at the *.img file you can see that the d spacing of the middle of each line is indeed 10 A, 5A, 3.33A, and 2.5A. Just as Bragg's Law predicts for n=1,2,3,4 because the two atoms were 10 A apart (d = 10 A) and the wavelength was 1 A. But what about the corners? The 2.5 A band reads a d-spacing of 1.65 A at the corners of the detector! Also, if you look at the central band, it passes through the direct beam (d=infinity), but at the edge of the detector it reads 2.14 A! Does this mean that Bragg's Law is wrong!? Of course not, it just means that Bragg's Law is one dimensional. Strictly speaking, it is about planes of atoms, not individual atoms themselves. The Fourier transform of two dots is indeed a series of bands (an interference pattern), but the Fourier transform of two planes (edge-on to the beam) is this: http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/20A_disks.img What? A caterpillar? How does that happen? Well, it helps to look at the diffraction pattern of a single plane: http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/20A_disk.img I should point out here that I'm not modelling an infinite plane, but rather a disk 20A in radius. This is why the edge of the caterpillar has a d-spacing of 40 A. If it were an infinite plane, its Fourier transform would be an infinitely thin line, visible at only one point: the origin. Which is not all that interesting. The halo around the main line is because the plane has a hard edge, and so its Fourier transform has fringes (its a sinc function). The reason why it does not run from the top of the image to the bottom is because the Ewald sphere (a geometric representation of Bragg's law) is curved, but the Fourier transform of a disk is a straight line in reciprocal space. By giving the plane a finite size you can more easily see that the diffraction pattern of a stack of two planes is nothing more than the diffraction pattern of one plane, multiplied by that of two points. This is a fundamental property of Fourier transforms: convolution becomes a product in reciprocal space. Where convolution is nothing more than copying an object to different places in space, and in this case these places are the two points in the Bragg diagram. But, still, why the caterpillar? It is because the Ewald sphere is curved, so the reciprocal-space line only brushes against it for a few orders. We can, however, get more orders by tilting the planes by some angle theta, such as the 11.53 degrees that satisfies n*lambda = 2*d*sin(theta) for n = 4. That is this image: http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disks.png Yes, you can still see the caterpillar, but clearly the 4th spot up is brighter than all but the 0th-order one. The only reason why it is not identical in intensity is because of the inverse square law: the pixels on the detector for the 4th-order reflection are a little further away from the sample than the zeroeth-order ones. As the planes get wider: http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disks.png http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_40A_disks.png http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_80A_disks.png http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_160A_disks.png the caterpillar gets thinner you see less and less of the n=1,2,3 orders. For an infinite pair of planes, there will be only two intersection points: the origin and the n=4 spot. This is not because the intermediate orders are not there, they are just not satisfying the Bragg condition, and neither are their fringes. Of course, with only two planes, even the infinite-plane spot will be much fatter in the vertical. Formally, about half as fat as the distance between the spots. This is because the interference pattern for only two points is still there. But if you have three, four or five planes, you get these: http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disk.png http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disks.png http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_3disks.png http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_4disks.png