Re: [ccp4bb] x-ray diffraction data analysis (XDS)

2013-09-01 Thread Kay Diederichs
The HKL2000 stats look very untypical to me. I wonder what the completeness 
is.

For the XDS processing, have you tried the hints given in the Optimization 
article in XDSwiki, in particular recycling of GXPARM.XDS and using the 
*_E.S.D. parameters as found in INTEGRATE.LP ?

According to your current CORRECT.LP, these data may be useful to 3.7A at most.

Good luck,

Kay


Re: [ccp4bb] x-ray diffraction data analysis (XDS)

2013-09-01 Thread Kay Diederichs
I forgot: you should be using FRIEDEL'S_LAW=TRUE


Re: [ccp4bb] Resolution, R factors and data quality

2013-09-01 Thread Frank von Delft

A bit late to this thread.

1.
Juergen:   Jim was not actually adopting CC*, he was asking how to make 
practical use of it when faced with actual datasets fading into noise.  
If I understand correctly from later responses, paired refinement is 
what KD suggest should be best practice?


2.
I'm struck by how small the improvements in R/Rfree are in Diederichs  
Karplus (ActaD 2013,http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/ 
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/);  the authors 
don't discuss it, but what's current thinking on how to estimate the 
expected variation in R/Rfree - does the Tickle formalism (1998) still 
apply for ML with very weak data?


I'm puzzled by Table 4 (and discussion):  do I read correctly that 
discarding negative unique reflections led to higher CCwork/CCfree?  
Wasn't the point of the paper that massaging data always shows up in 
worse refinement stats?  Is this a corner case, and how would one know?


Cheers
phx











On 28/08/2013 01:48, Bosch, Juergen wrote:

Hi Jim,

all data is good data - the more data you have the better (that's what 
they say anyhow)


Not everybody is adopting to the Karplus Diederich paper as quickly as 
you do. And not to be confused with the Diederichs and Karplus paper :-)

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/
http://www.ncbi.nlm.nih.gov/pubmed/22628654

My models get better by including the data I had been omitting before, 
that's all that counts for me.


Jürgen

P.S. reminds me somehow of those guys collecting more and more data - 
PRISM greetings


On Aug 27, 2013, at 8:29 PM, Jim Pflugrath wrote:


I have to ask flamingly: So what about CC1/2 and CC*?

Did we not replace an arbitrary resolution cut-off based on a value 
of Rmerge with an arbitrary resolution cut-off based on a value of 
Rmeas already?  And now we are going to replace that with an 
arbitrary resolution cut-off based on a value of CC* or is it CC1/2?


I am asked often:  What value of CC1/2 should I cut my resolution at? 
 What should I tell my students?  I've got a course coming up and I 
am sure they will ask me again.


Jim


*From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK 
mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of Arka Chakraborty 
[arko.chakrabort...@gmail.com mailto:arko.chakrabort...@gmail.com]

*Sent:* Tuesday, August 27, 2013 7:45 AM
*To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
*Subject:* Re: [ccp4bb] Resolution, R factors and data quality

Hi all,
does this not again bring up the still prevailing adherence to R 
factors and not  a shift to correlation coefficients ( CC1/2 and CC*) 
? (as Dr. Phil Evans has indicated).?
The way we look at data quality ( by we I mean the end users ) 
needs to be altered, I guess.


best,

Arka Chakraborty

On Tue, Aug 27, 2013 at 9:50 AM, Phil Evans p...@mrc-lmb.cam.ac.uk 
mailto:p...@mrc-lmb.cam.ac.uk wrote:


The question you should ask yourself is why would omitting data
improve my model?

Phil



..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://lupo.jhsph.edu








Re: [ccp4bb] Resolution, R factors and data quality

2013-09-01 Thread Ian Tickle
On 1 September 2013 11:31, Frank von Delft frank.vonde...@sgc.ox.ac.ukwrote:


 2.
 I'm struck by how small the improvements in R/Rfree are in Diederichs 
 Karplus (ActaD 2013, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3689524/);
 the authors don't discuss it, but what's current thinking on how to
 estimate the expected variation in R/Rfree - does the Tickle formalism
 (1998) still apply for ML with very weak data?


Frank, our paper is still relevant, unfortunately just not to the question
you're trying to answer!  We were trying to answer 2 questions: 1) what
value of Rfree would you expect to get if the structure were free of
systematic error and only random errors were present, so that could be used
as a baseline (assuming a fixed cross-validation test set) to identify
models with gross (e.g. chain-tracing) errors; and 2) how much would you
expect Rfree to vary assuming a fixed starting model but with a different
random sampling of the test set (i.e. the sampling standard deviation).
The latter is relevant if say you want to compare the same structure (at
the same resolution obviously) done independently in 2 labs, since it tells
you how big the difference in Rfree for an arbitrary choice of test set
needs to be before you can claim that it's statistically significant.

In this case the questions are different because you're certainly not
comparing different models using the same test set, neither I suspect are
you comparing the same model with different randomly selected test sets.  I
assume in this case that the test sets for different resolution cut-offs
are highly correlated, which I suspect makes it quite difficult to say what
is a significant difference in Rfree (I have not attempted to do the
algebra!).

Rfree is one of a number of model selection criteria (see
http://en.wikipedia.org/wiki/Model_selection#Criteria_for_model_selection)
whose purpose is to provide a metric for comparison of different models
given specific data, i.e. as for the likelihood function they all take the
form f(model | data), so in all cases you're varying the model with fixed
data.  It's use in the form f(data | model), i.e. where you're varying the
data with a fixed model I would say is somewhat questionable and certainly
requires careful analysis to determine whether the results are
statistically significant.  Even assuming we can argue our way around the
inappropriate application of model selection methodology to a different
problem, unfortunately Rfree is far from an ideal criterion in this
respect; a better one would surely be the free log-likelihood as originally
proposed by Gerard Bricogne.

Cheers

-- Ian


Re: [ccp4bb] Dependency of theta on n/d in Bragg's law

2013-09-01 Thread James Holton
Perhaps some of the confusion here arises because Bragg's Law is not a 
Fourier transform.


Remember, in the standard diagram of Bragg's Law, there are only two 
atoms that are d apart.  The full diffraction pattern from just two 
atoms actually looks like this:

http://bl831.als.lbl.gov/~jamesh/nearBragg/intimage_twoatom.img

This is an ADSC format image, so you can look at it in your favorite 
diffraction image viewer, such as ADXV, imosflm, HKL2000, XDSviewer, 
ipdisp, fit2d, whatever you like.  Or, you can substitute png for 
img in the filename and look at it in your web browser.  Notice how 
there are 9 bands for only 2 atoms?  If you look at the *.img file you 
can see that the d spacing of the middle of each line is indeed 10 A, 
5A, 3.33A, and 2.5A.  Just as Bragg's Law predicts for n=1,2,3,4 because 
the two atoms were 10 A apart (d = 10 A) and the wavelength was 1 A.  
But what about the corners?  The 2.5 A band reads a d-spacing of 1.65 
A at the corners of the detector!  Also, if you look at the central 
band, it passes through the direct beam (d=infinity), but at the edge 
of the detector it reads 2.14 A!   Does this mean that Bragg's Law is 
wrong!?


Of course not, it just means that Bragg's Law is one dimensional. 
Strictly speaking, it is about planes of atoms, not individual atoms 
themselves.  The Fourier transform of two dots is indeed a series of 
bands (an interference pattern), but the Fourier transform of two 
planes (edge-on to the beam) is this:

http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/20A_disks.img

What?  A caterpillar?  How does that happen?  Well, it helps to look at 
the diffraction pattern of a single plane:

http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/20A_disk.img

 I should point out here that I'm not modelling an infinite plane, but 
rather a disk 20A in radius.  This is why the edge of the caterpillar 
has a d-spacing of 40 A.  If it were an infinite plane, its Fourier 
transform would be an infinitely thin line, visible at only one point: 
the origin.  Which is not all that interesting. The halo around the 
main line is because the plane has a hard edge, and so its Fourier 
transform has fringes (its a sinc function).  The reason why it does 
not run from the top of the image to the bottom is because the Ewald 
sphere (a geometric representation of Bragg's law) is curved, but the 
Fourier transform of a disk is a straight line in reciprocal space.


  By giving the plane a finite size you can more easily see that the 
diffraction pattern of a stack of two planes is nothing more than the 
diffraction pattern of one plane, multiplied by that of two points.  
This is a fundamental property of Fourier transforms: convolution 
becomes a product in reciprocal space.  Where convolution is nothing 
more than copying an object to different places in space, and in this 
case these places are the two points in the Bragg diagram.


But, still, why the caterpillar?  It is because the Ewald sphere is 
curved, so the reciprocal-space line only brushes against it for a few 
orders.  We can, however, get more orders by tilting the planes by some 
angle theta, such as the 11.53 degrees that satisfies n*lambda = 
2*d*sin(theta) for n = 4.  That is this image:

http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disks.png

Yes, you can still see the caterpillar, but clearly the 4th spot up is 
brighter than all but the 0th-order one.  The only reason why it is not 
identical in intensity is because of the inverse square law: the pixels 
on the detector for the 4th-order reflection are a little further away 
from the sample than the zeroeth-order ones.


 As the planes get wider:
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disks.png
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_40A_disks.png
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_80A_disks.png
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_160A_disks.png
the caterpillar gets thinner you see less and less of the n=1,2,3 
orders.  For an infinite pair of planes, there will be only two 
intersection points: the origin and the n=4 spot.  This is not because 
the intermediate orders are not there, they are just not satisfying the 
Bragg condition, and neither are their fringes.


Of course, with only two planes, even the infinite-plane spot will be 
much fatter in the vertical.  Formally, about half as fat as the 
distance between the spots.  This is because the interference pattern 
for only two points is still there.  But if you have three, four or five 
planes, you get these:

http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disk.png
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_disks.png
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_3disks.png
http://bl831.als.lbl.gov/~jamesh/nearBragg/BraggsLaw/tilted_20A_4disks.png