Re: [ccp4bb] Against Method (R)

2010-10-29 Thread Bart Hazes






On 10-10-29 12:03 AM, Robbie Joosten wrote:

  Hi
Bart,
  
I agree with the building strategy you propose, but at some point it
stops helping and a bit more attention to detail is needed. Reciprocal
space refinement doesn't seem to do the fine details. It always
surprises me how much atoms still move when you real-space refine a
refined model, especially the waters. I admit this is not a fair
comparison.

Does the water move back to its old position if you follow up the
real-space refinement with more reciprocal refinement. If so, the map
may not have been a true representation of reality. Basically what I
was implying is that if the required model changes "details" are such
that they fall within the radius of convergence then the atoms should
move to their correct positions; unless something is keeping them from
moving such as an incorrectly placed side chain that causes a steric
conflict. Fix the incorrect side chain and your "details" will take
care of themselves. I don't imply that I can always spot an easy error
to fix and sometimes end up rebuilding several different ways in the
hopes that one will resolve whatever was the problem. If that doesn't
happen at some point you need to give up, especially if it does not
affect a functionally important region. I do think it is good practice
to point out regions in the model that are problematic and have never
had reviewers complain about that if it is clear you made the effort to
get it as good as possible given the data.
High resolution data helps, but better data makes it
tempting to put too little effort in optimising the model. I've seen
some horribly obvious errors in hi-res models (more than 10 sigma
difference density peaks for misplaced side chains). At the same time
there are quite a lot of low-res models that are exceptionally good.

Can't blame the data for that, in the end each person (and supervisor)
need to take responsibility for the models they produce and deposit.
Same applies to sequence data bases that are full of lazy errors. If
humans are involved both greatness and stupidy are likely outcomes.

Bart

Cheers,
Robbie
  
> Date: Thu, 28 Oct 2010 16:32:04 -0600
> From: bart.ha...@ualberta.ca
> Subject: Re: [ccp4bb] Against Method (R)
> To: CCP4BB@JISCMAIL.AC.UK
> 
> On 10-10-28 04:09 PM, Ethan Merritt wrote:
> > This I can answer based on experience. One can take the
coordinates from a structure
> > refined at near atomic resolution (~1.0A), including multiple
conformations,
> > partial occupancy waters, etc, and use it to calculate R
factors against a lower
> > resolution (say 2.5A) data set collected from an isomorphous
crystal. The
> > R factors from this total-rigid-body replacement will be
better than anything you
> > could get from refinement against the lower resolution data.
In fact, refinement
> > from this starting point will just make the R factors worse.
> >
> > What this tells us is that the crystallographic residuals can
recognize a
> > better model when they see one. But our refinement programs
are not good
> > enough to produce such a better model in the first place.
Worsr, they are not
> > even good enough to avoid degrading the model.
> >
> > That's essentially the same thing Bart said, perhaps a little
more pessimistic :-)
> >
> > cheers,
> >
> > Ethan
> > 
> 
> Not pessimistic at all, just realistic and perhaps even optimistic
for 
> methods developers as apparently there is still quite a bit of
progress 
> that can be made by improving the "search strategy" during
refinement.
> 
> During manual refinement I normally tell students not to bother
about 
> translating/rotating/torsioning atoms by just a tiny bit to make
it fit 
> better. Likewise there is no point in moving atoms a little bit to
  
> correct a distorted bond or bond length. If it needed to move that
  
> little bit the refinement program would have done it for you. Look
for 
> discreet errors in the problematic residue or its neighbors:
peptide 
> flips, 120 degree changes in side chain dihedrals, etc. If you can
find 
> and fix one of those errors a lot of the stereochemical
distortions and 
> non-ideal fit to density surrounding that residue will suddenly 
> disappear as well.
> 
> The benefit of high resolution is that it is much easier to pick
up and 
> fix such errors (or not make them in the first place)
> 
> Bart
> 
> -- 
> 
>

> 
> Bart Hazes (Associate Professor)
> Dept. of Medical Microbiology& Immunology
> University of Alberta
> 1-15 Medical Sciences Building
> Edmonton, Alberta
> Canada, T6G 2H7
> phone: 1-780-492-0042
> fax: 1-780-492-7521
> 
>
===

Re: [ccp4bb] Against Method (R)

2010-10-29 Thread Dirk Kostrewa

Dear George,

thanks a lot! I see the point, that in reciprocal space refinement one 
could refine directly against the observed intensities and sigmas. But 
in principle, one could do iterative real space refinement, structure 
factor and intensity calculation for refinement statistics and weights, 
calculation of an improved electron density map (but that requires Fs 
again ...), and so forth until some convergence criterion is met. I 
wonder, which refinement scheme is more efficient.


The missing reflections in map calculation is something that we have to 
live with, and unless the data are severely incomplete, I must admit, 
that I don't worry too much.


The twinning problem is really severe! Here, I don't see how this could 
be done in a clever way in real space.


Interesting discussion ...

Best wishes,

Dirk.

Am 29.10.10 10:41, schrieb George M. Sheldrick:

Dear Dirk,

There are good reasons why real space refinement has never become popular.
With reciprocal space refinement, you refine directly against what you
measured, taking the standard uncertainly of each individual intensity
into account. In this context I was pleased to read in CCP4bb that REFMAC
will soon be refining against intensities (like SHELXL). Then the
assumptions made (e.g. no distortion of the expected intensity distribution
by e.g. NCS or twinning) and even 'bugs' in (c)truncate will no longer
matter. If for some reason a reflection wasn't measured, then simply
leaving it out it does not invalidate a recoprocal space refinement.
The same applies to reflections that are reserved for Rfree.

In contrast, the electron density is only theoretically correct if all
reflections between 0,0,0 and infinity are included in the Fourier
summation, For a twin it is even worse, because we don't know how to
partition the difference between Fo^2 and Fc^2 between the twin
components. None of the attempts to work around these problems are
entirely convincing. Maps and real space refinement are invaluable in
the intermediate stages of model building and correction, but the
final refinement should be performed in reciprocal space.

Best wishes, George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Fri, 29 Oct 2010, Dirk Kostrewa wrote:


Hi Robbie,

yes, the apparently larger radius of convergence in real space refinement
impresses me, too. Therefore, I usually do local real space refinement after
manually correcting errors, either with Moloc at lower resolution or with Coot
at higher resolution, prior to reciprocal space refinement.

If I recall correctly, real space refinement was introduced by Robert Diamond
in the 60s long before reciprocal space refinement. In the 90s Michael Chapman
tried to revive it, but without much success, as far as I know. With the fast
computers today, maybe the time has come again for real space refinement ...

Best regards,

Dirk.

Am 29.10.10 08:03, schrieb Robbie Joosten:

Hi Bart,

I agree with the building strategy you propose, but at some point it stops
helping and a bit more attention to detail is needed. Reciprocal space
refinement doesn't seem to do the fine details. It always surprises me how
much atoms still move when you real-space refine a refined model, especially
the waters. I admit this is not a fair comparison.

High resolution data helps, but better data makes it tempting to put too
little effort in optimising the model. I've seen some horribly obvious
errors in hi-res models (more than 10 sigma difference density peaks for
misplaced side chains). At the same time there are quite a lot of low-res
models that are exceptionally good.

Cheers,
Robbie


Date: Thu, 28 Oct 2010 16:32:04 -0600
From: bart.ha...@ualberta.ca
Subject: Re: [ccp4bb] Against Method (R)
To: CCP4BB@JISCMAIL.AC.UK

On 10-10-28 04:09 PM, Ethan Merritt wrote:

This I can answer based on experience. One can take the coordinates

from a structure

refined at near atomic resolution (~1.0A), including multiple

conformations,

partial occupancy waters, etc, and use it to calculate R factors

against a lower

resolution (say 2.5A) data set collected from an isomorphous

crystal. The

R factors from this total-rigid-body replacement will be better

than anything you

could get from refinement against the lower resolution data. In

fact, refinement

from this starting point will just make the R factors worse.

What this tells us is that the crystallographic residuals can

recognize a

better model when they see one. But our refinement programs are not

good

enough to produce such a better model in the first place. Worsr,

they are not

even good enough to avoid degrading the model.

That's essentially the same thing Bart said, perhaps a little more

pessimistic :-)

cheers,

Ethan


Not pessimistic at all, just realistic and perhaps even optimistic for
me

Re: [ccp4bb] Against Method (R)

2010-10-29 Thread George M. Sheldrick

Dear Dirk,

There are good reasons why real space refinement has never become popular. 
With reciprocal space refinement, you refine directly against what you 
measured, taking the standard uncertainly of each individual intensity
into account. In this context I was pleased to read in CCP4bb that REFMAC
will soon be refining against intensities (like SHELXL). Then the 
assumptions made (e.g. no distortion of the expected intensity distribution 
by e.g. NCS or twinning) and even 'bugs' in (c)truncate will no longer 
matter. If for some reason a reflection wasn't measured, then simply 
leaving it out it does not invalidate a recoprocal space refinement. 
The same applies to reflections that are reserved for Rfree.

In contrast, the electron density is only theoretically correct if all 
reflections between 0,0,0 and infinity are included in the Fourier 
summation, For a twin it is even worse, because we don't know how to 
partition the difference between Fo^2 and Fc^2 between the twin 
components. None of the attempts to work around these problems are 
entirely convincing. Maps and real space refinement are invaluable in 
the intermediate stages of model building and correction, but the 
final refinement should be performed in reciprocal space.

Best wishes, George  

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Fri, 29 Oct 2010, Dirk Kostrewa wrote:

> Hi Robbie,
> 
> yes, the apparently larger radius of convergence in real space refinement
> impresses me, too. Therefore, I usually do local real space refinement after
> manually correcting errors, either with Moloc at lower resolution or with Coot
> at higher resolution, prior to reciprocal space refinement.
> 
> If I recall correctly, real space refinement was introduced by Robert Diamond
> in the 60s long before reciprocal space refinement. In the 90s Michael Chapman
> tried to revive it, but without much success, as far as I know. With the fast
> computers today, maybe the time has come again for real space refinement ...
> 
> Best regards,
> 
> Dirk.
> 
> Am 29.10.10 08:03, schrieb Robbie Joosten:
> > Hi Bart,
> >
> > I agree with the building strategy you propose, but at some point it stops
> > helping and a bit more attention to detail is needed. Reciprocal space
> > refinement doesn't seem to do the fine details. It always surprises me how
> > much atoms still move when you real-space refine a refined model, especially
> > the waters. I admit this is not a fair comparison.
> >
> > High resolution data helps, but better data makes it tempting to put too
> > little effort in optimising the model. I've seen some horribly obvious
> > errors in hi-res models (more than 10 sigma difference density peaks for
> > misplaced side chains). At the same time there are quite a lot of low-res
> > models that are exceptionally good.
> >
> > Cheers,
> > Robbie
> >
> > > Date: Thu, 28 Oct 2010 16:32:04 -0600
> > > From: bart.ha...@ualberta.ca
> > > Subject: Re: [ccp4bb] Against Method (R)
> > > To: CCP4BB@JISCMAIL.AC.UK
> > >
> > > On 10-10-28 04:09 PM, Ethan Merritt wrote:
> > > > This I can answer based on experience. One can take the coordinates 
> > from a structure
> > > > refined at near atomic resolution (~1.0A), including multiple 
> > conformations,
> > > > partial occupancy waters, etc, and use it to calculate R factors 
> > against a lower
> > > > resolution (say 2.5A) data set collected from an isomorphous 
> > crystal. The
> > > > R factors from this total-rigid-body replacement will be better 
> > than anything you
> > > > could get from refinement against the lower resolution data. In 
> > fact, refinement
> > > > from this starting point will just make the R factors worse.
> > > >
> > > > What this tells us is that the crystallographic residuals can 
> > recognize a
> > > > better model when they see one. But our refinement programs are not 
> > good
> > > > enough to produce such a better model in the first place. Worsr, 
> > they are not
> > > > even good enough to avoid degrading the model.
> > > >
> > > > That's essentially the same thing Bart said, perhaps a little more 
> > pessimistic :-)
> > > >
> > > > cheers,
> > > >
> > > > Ethan
> > > >
> > >
> > > Not pessimistic at all, just realistic and perhaps even optimistic for
> > > methods developers as apparent

Re: [ccp4bb] Against Method (R)

2010-10-29 Thread Dirk Kostrewa

Hi Robbie,

yes, the apparently larger radius of convergence in real space 
refinement impresses me, too. Therefore, I usually do local real space 
refinement after manually correcting errors, either with Moloc at lower 
resolution or with Coot at higher resolution, prior to reciprocal space 
refinement.


If I recall correctly, real space refinement was introduced by Robert 
Diamond in the 60s long before reciprocal space refinement. In the 90s 
Michael Chapman tried to revive it, but without much success, as far as 
I know. With the fast computers today, maybe the time has come again for 
real space refinement ...


Best regards,

Dirk.

Am 29.10.10 08:03, schrieb Robbie Joosten:

Hi Bart,

I agree with the building strategy you propose, but at some point it 
stops helping and a bit more attention to detail is needed. Reciprocal 
space refinement doesn't seem to do the fine details. It always 
surprises me how much atoms still move when you real-space refine a 
refined model, especially the waters. I admit this is not a fair 
comparison.


High resolution data helps, but better data makes it tempting to put 
too little effort in optimising the model. I've seen some horribly 
obvious errors in hi-res models (more than 10 sigma difference density 
peaks for misplaced side chains). At the same time there are quite a 
lot of low-res models that are exceptionally good.


Cheers,
Robbie

> Date: Thu, 28 Oct 2010 16:32:04 -0600
> From: bart.ha...@ualberta.ca
> Subject: Re: [ccp4bb] Against Method (R)
> To: CCP4BB@JISCMAIL.AC.UK
>
> On 10-10-28 04:09 PM, Ethan Merritt wrote:
> > This I can answer based on experience. One can take the coordinates 
from a structure
> > refined at near atomic resolution (~1.0A), including multiple 
conformations,
> > partial occupancy waters, etc, and use it to calculate R factors 
against a lower
> > resolution (say 2.5A) data set collected from an isomorphous 
crystal. The
> > R factors from this total-rigid-body replacement will be better 
than anything you
> > could get from refinement against the lower resolution data. In 
fact, refinement

> > from this starting point will just make the R factors worse.
> >
> > What this tells us is that the crystallographic residuals can 
recognize a
> > better model when they see one. But our refinement programs are not 
good
> > enough to produce such a better model in the first place. Worsr, 
they are not

> > even good enough to avoid degrading the model.
> >
> > That's essentially the same thing Bart said, perhaps a little more 
pessimistic :-)

> >
> > cheers,
> >
> > Ethan
> >
>
> Not pessimistic at all, just realistic and perhaps even optimistic for
> methods developers as apparently there is still quite a bit of progress
> that can be made by improving the "search strategy" during refinement.
>
> During manual refinement I normally tell students not to bother about
> translating/rotating/torsioning atoms by just a tiny bit to make it fit
> better. Likewise there is no point in moving atoms a little bit to
> correct a distorted bond or bond length. If it needed to move that
> little bit the refinement program would have done it for you. Look for
> discreet errors in the problematic residue or its neighbors: peptide
> flips, 120 degree changes in side chain dihedrals, etc. If you can find
> and fix one of those errors a lot of the stereochemical distortions and
> non-ideal fit to density surrounding that residue will suddenly
> disappear as well.
>
> The benefit of high resolution is that it is much easier to pick up and
> fix such errors (or not make them in the first place)
>
> Bart
>
> --
>
> 
>
> Bart Hazes (Associate Professor)
> Dept. of Medical Microbiology& Immunology
> University of Alberta
> 1-15 Medical Sciences Building
> Edmonton, Alberta
> Canada, T6G 2H7
> phone: 1-780-492-0042
> fax: 1-780-492-7521
>
> 


--

***
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:+49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:www.genzentrum.lmu.de
***



Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Robbie Joosten
Hi Bart,

I agree with the building strategy you propose, but at some point it stops 
helping and a bit more attention to detail is needed. Reciprocal space 
refinement doesn't seem to do the fine details. It always surprises me how much 
atoms still move when you real-space refine a refined model, especially the 
waters. I admit this is not a fair comparison.

High resolution data helps, but better data makes it tempting to put too little 
effort in optimising the model. I've seen some horribly obvious errors in 
hi-res models (more than 10 sigma difference density peaks for misplaced side 
chains). At the same time there are quite a lot of low-res models that are 
exceptionally good. 

Cheers,
Robbie

> Date: Thu, 28 Oct 2010 16:32:04 -0600
> From: bart.ha...@ualberta.ca
> Subject: Re: [ccp4bb] Against Method (R)
> To: CCP4BB@JISCMAIL.AC.UK
> 
> On 10-10-28 04:09 PM, Ethan Merritt wrote:
> > This I can answer based on experience.  One can take the coordinates from a 
> > structure
> > refined at near atomic resolution (~1.0A), including multiple conformations,
> > partial occupancy waters, etc, and use it to calculate R factors against a 
> > lower
> > resolution (say 2.5A) data set collected from an isomorphous crystal.  The
> > R factors from this total-rigid-body replacement will be better than 
> > anything you
> > could get from refinement against the lower resolution data.  In fact, 
> > refinement
> > from this starting point will just make the R factors worse.
> >
> > What this tells us is that the crystallographic residuals can recognize a
> > better model when they see one. But our refinement programs are not good
> > enough to produce such a better model in the first place. Worsr, they are 
> > not
> > even good enough to avoid degrading the model.
> >
> > That's essentially the same thing Bart said, perhaps a little more 
> > pessimistic :-)
> >
> > cheers,
> >
> > Ethan
> >
> 
> Not pessimistic at all, just realistic and perhaps even optimistic for 
> methods developers as apparently there is still quite a bit of progress 
> that can be made by improving the "search strategy" during refinement.
> 
> During manual refinement I normally tell students not to bother about 
> translating/rotating/torsioning atoms by just a tiny bit to make it fit 
> better. Likewise there is no point in moving atoms a little bit to 
> correct a distorted bond or bond length. If it needed to move that 
> little bit the refinement program would have done it for you. Look for 
> discreet errors in the problematic residue or its neighbors: peptide 
> flips, 120 degree changes in side chain dihedrals, etc. If you can find 
> and fix one of those errors a lot of the stereochemical distortions and 
> non-ideal fit to density surrounding that residue will suddenly 
> disappear as well.
> 
> The benefit of high resolution is that it is much easier to pick up and 
> fix such errors (or not make them in the first place)
> 
> Bart
> 
> -- 
> 
> 
> 
> Bart Hazes (Associate Professor)
> Dept. of Medical Microbiology&  Immunology
> University of Alberta
> 1-15 Medical Sciences Building
> Edmonton, Alberta
> Canada, T6G 2H7
> phone:  1-780-492-0042
> fax:1-780-492-7521
> 
> 
  

Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Bart Hazes

On 10-10-28 04:09 PM, Ethan Merritt wrote:

This I can answer based on experience.  One can take the coordinates from a 
structure
refined at near atomic resolution (~1.0A), including multiple conformations,
partial occupancy waters, etc, and use it to calculate R factors against a lower
resolution (say 2.5A) data set collected from an isomorphous crystal.  The
R factors from this total-rigid-body replacement will be better than anything 
you
could get from refinement against the lower resolution data.  In fact, 
refinement
from this starting point will just make the R factors worse.

What this tells us is that the crystallographic residuals can recognize a
better model when they see one. But our refinement programs are not good
enough to produce such a better model in the first place. Worsr, they are not
even good enough to avoid degrading the model.

That's essentially the same thing Bart said, perhaps a little more pessimistic 
:-)

cheers,

Ethan
   


Not pessimistic at all, just realistic and perhaps even optimistic for 
methods developers as apparently there is still quite a bit of progress 
that can be made by improving the "search strategy" during refinement.


During manual refinement I normally tell students not to bother about 
translating/rotating/torsioning atoms by just a tiny bit to make it fit 
better. Likewise there is no point in moving atoms a little bit to 
correct a distorted bond or bond length. If it needed to move that 
little bit the refinement program would have done it for you. Look for 
discreet errors in the problematic residue or its neighbors: peptide 
flips, 120 degree changes in side chain dihedrals, etc. If you can find 
and fix one of those errors a lot of the stereochemical distortions and 
non-ideal fit to density surrounding that residue will suddenly 
disappear as well.


The benefit of high resolution is that it is much easier to pick up and 
fix such errors (or not make them in the first place)


Bart

--



Bart Hazes (Associate Professor)
Dept. of Medical Microbiology&  Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780-492-7521




Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Bart Hazes




You're second suggestion would be a good test because you are dealing
with data from the same crystal and can thus assume the structures are
identical (radiation damage excluded).
So, take a highly diffracting crystal and collect a short-exposure low
resolution data set and long exposure high resolution data set. Let's
say with I/Sig=2 at 2.0 and 1.2 high-resolution shells. Give the data
to two equally capable students to determine the structure by molecular
replacement from a, let's say 30% sequence identity starting model. You
could also use automated model building to be more objective and avoid
becoming unpopular with your students.

Proceed until each model is fully refined against its own data. Now run
some more refinement, without manual rebuilding, of the lowres model
versus the highres data (and perhaps some rigid body or other minimal
refinement of the highres model versus the lowres data, make sure R
& Rfree go down). I predict the highres model will fit the lowres
data noticeably better than the lowres model did and the lowres model,
even after refinement with the highres data, will not reach the same
quality as the highres model. Looking at Fo-Fc maps in the latter case
may give some hints as to which model errors were not recognized at 2A
resolution. You'll probably find peptide flips, mis-modeled leucine and
other side chains, dual conformations not recognized at 2A resolution,
more realistic B values, more waters ...

Bart

On 10-10-28 03:49 PM, Jacob Keller wrote:

  
  
  
  So let's say I take a 0.6 Ang
structure, artificially introduce noise into corresponding Fobs to make
the resolution go down to 2 Ang, and refine using the 0.6 Ang model--do
I actually get R's better than the artificially-inflated sigmas? Or
let's say I experimentally decrease I/sigma by attenuating the beam and
collect another data set--same situation?
   
  JPK
  
 
-
Original Message - 
From:
Bart Hazes 
To:
CCP4BB@JISCMAIL.AC.UK 
Sent:
Thursday, October 28, 2010 4:13 PM
    Subject:
Re: [ccp4bb] Against Method (R)


There are many cases where people use a structure refined at high
resolution as a starting molecular replacement structure for a closely
related/same protein with a lower resolution data set and get
substantially better R statistics than you would expect for that
resolution. So one factor in the "R factor gap" is many small errors
that are introduced during model building and not recognized and fixed
later due to limited resolution. In a perfect world, refinement would
find the global minimum but in practice all these little errors get
stuck in local minima with distortions in neighboring atoms
compensating for the initial error and thereby hiding their existence.

Bart

On 10-10-28 11:33 AM, James Holton wrote:
It is important to remember that if you have
Gaussian-distributed errors and you plot error bars between +1 sigma
and -1 sigma (where "sigma" is the rms error), then you expect the
"right" curve to miss the error bars about 30% of the time.  This is
just a property of the Gaussian distribution: you expect a certain
small number of the errors to be large.  If the curve passes within the
bounds of every single one of your error bars, then your error
estimates are either too big, or the errors have a non-Gaussian
distribution.  
  
For example, if the noise in the data somehow had a uniform
distribution (always between +1 and -1), then no data point will ever
be "kicked" further than "1" away from the "right" curve.  In this
case, a data point more than "1" away from the curve is evidence that
you either have the wrong model (curve), or there is some other kind of
noise around (wrong "error model").
  
As someone who has spent a lot of time looking into how we measure
intensities, I think I can say with some considerable amount of
confidence that we are doing a pretty good job of estimating the
errors.  At least, they are certainly not off by an average of 40% (20%
in F).  You could do better than that estimating the intensities by eye!
  
Everybody seems to have their own favorite explanation for what I call
the "R factor gap": solvent, multi-confomer structures, absorption
effects, etc.  However, if you go through the literature (old and new)
you will find countless attempts to include more sophisticated versions
of each of these hypothetically "important" systematic errors, and in
none of these cases has anyone ever presented a physically reasonable
model that explained the observed spot intensities from a protein
crystal to within experimental error.  Or at least, if there is such a
paper, I haven't seen it.
  
Since there are so many possible things to "correct", what I would like
to find is a structure that represents the transition between the
"small molecul

Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Ethan Merritt
Bart Hazes wrote > 
>   There are many cases where people use a structure refined at high 
> resolution as a starting molecular replacement structure for a closely 
> related/same protein with a lower resolution data set and get substantially 
> better R statistics than you would expect for that resolution. So one factor 
> in the "R factor gap" is many small errors that are introduced during model 
> building and not recognized and fixed later due to limited resolution. In a 
> perfect world, refinement would find the global minimum but in practice all 
> these little errors get stuck in local minima with distortions in neighboring 
> atoms compensating for the initial error and thereby hiding their existence.

Excellent point.

On Thursday, October 28, 2010 02:49:11 pm Jacob Keller wrote:
> So let's say I take a 0.6 Ang structure, artificially introduce noise into 
> corresponding Fobs to make the resolution go down to 2 Ang, and refine using 
> the 0.6 Ang model--do I actually get R's better than the 
> artificially-inflated sigmas?
> Or let's say I experimentally decrease I/sigma by attenuating the beam and 
> collect another data set--same situation?

This I can answer based on experience.  One can take the coordinates from a 
structure
refined at near atomic resolution (~1.0A), including multiple conformations,
partial occupancy waters, etc, and use it to calculate R factors against a lower
resolution (say 2.5A) data set collected from an isomorphous crystal.  The
R factors from this total-rigid-body replacement will be better than anything 
you
could get from refinement against the lower resolution data.  In fact, 
refinement
from this starting point will just make the R factors worse.

What this tells us is that the crystallographic residuals can recognize a
better model when they see one. But our refinement programs are not good 
enough to produce such a better model in the first place. Worsr, they are not
even good enough to avoid degrading the model.

That's essentially the same thing Bart said, perhaps a little more pessimistic 
:-)

cheers,

Ethan



> 
> JPK
> 
>   - Original Message - 
>   From: Bart Hazes 
>   To: CCP4BB@JISCMAIL.AC.UK 
>   Sent: Thursday, October 28, 2010 4:13 PM
>   Subject: Re: [ccp4bb] Against Method (R)
> 
> 
>   There are many cases where people use a structure refined at high 
> resolution as a starting molecular replacement structure for a closely 
> related/same protein with a lower resolution data set and get substantially 
> better R statistics than you would expect for that resolution. So one factor 
> in the "R factor gap" is many small errors that are introduced during model 
> building and not recognized and fixed later due to limited resolution. In a 
> perfect world, refinement would find the global minimum but in practice all 
> these little errors get stuck in local minima with distortions in neighboring 
> atoms compensating for the initial error and thereby hiding their existence.
> 
>   Bart
> 
>   On 10-10-28 11:33 AM, James Holton wrote: 
> It is important to remember that if you have Gaussian-distributed errors 
> and you plot error bars between +1 sigma and -1 sigma (where "sigma" is the 
> rms error), then you expect the "right" curve to miss the error bars about 
> 30% of the time.  This is just a property of the Gaussian distribution: you 
> expect a certain small number of the errors to be large.  If the curve passes 
> within the bounds of every single one of your error bars, then your error 
> estimates are either too big, or the errors have a non-Gaussian distribution. 
>  
> 
> For example, if the noise in the data somehow had a uniform distribution 
> (always between +1 and -1), then no data point will ever be "kicked" further 
> than "1" away from the "right" curve.  In this case, a data point more than 
> "1" away from the curve is evidence that you either have the wrong model 
> (curve), or there is some other kind of noise around (wrong "error model").
> 
> As someone who has spent a lot of time looking into how we measure 
> intensities, I think I can say with some considerable amount of confidence 
> that we are doing a pretty good job of estimating the errors.  At least, they 
> are certainly not off by an average of 40% (20% in F).  You could do better 
> than that estimating the intensities by eye!
> 
> Everybody seems to have their own favorite explanation for what I call 
> the "R factor gap": solvent, multi-confomer structures, absorption effects, 
> etc.  However, if you go through the literature (old and new) you will find 
> countless attempts to include more sophisti

Re: [ccp4bb] Against Method (R)

2010-10-28 Thread James Holton
re are so many possible things to "correct", what I would like to
> find is a structure that represents the transition between the "small
> molecule" and the "macromolecule" world.  Lysozyme does not qualify!  Even
> the famous 0.6 A structure of lysozyme (2vb1) still has a "mean absolute
> chi": <|Iobs-Icalc|/sig(I)> = 4.5.  Also, the 1.4 A structure of the
> tetrapeptide QQNN (2olx) is only a little better at <|chi|> = 3.5.  I
> realize that the "chi" I describe here is not a "standard" crystallographic
> statistic, and perhaps I need a statistics lesson, but it seems to me there
> ought to be a case where it is close to 1.
>
> -James Holton
> MAD Scientist
>
> On Thu, Oct 28, 2010 at 9:04 AM, Jacob Keller <
> j-kell...@fsm.northwestern.edu> wrote:
>
>> So I guess there is never a case in crystallography in which our
>> models predict the data to within the errors of data collection? I
>> guess the situation might be similar to fitting a Michaelis-Menten
>> curve, in which the fitted line often misses the error bars of the
>> individual points, but gets the overall pattern right. In that case,
>> though, I don't think we say that we are inadequately modelling the
>> data. I guess there the error bars are actually too small (are
>> underestimated.) Maybe our intensity errors are also underestimated?
>>
>> JPK
>>
>> On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick
>>  wrote:
>> >
>> > Not quite. I was trying to say that for good small molecule data, R1 is
>> > usally significantly less than Rmerge, but never less than the precision
>> > of the experimental data measured by 0.5*/ = 0.5*Rsigma
>> > (or the very similar 0.5*Rpim).
>> >
>> > George
>> >
>> > Prof. George M. Sheldrick FRS
>> > Dept. Structural Chemistry,
>> > University of Goettingen,
>> > Tammannstr. 4,
>> > D37077 Goettingen, Germany
>> > Tel. +49-551-39-3021 or -3068
>> > Fax. +49-551-39-22582
>> >
>> >
>> > On Thu, 28 Oct 2010, Jacob Keller wrote:
>> >
>> >> So I guess a consequence of what you say is that since in cases where
>> there is
>> >> no solvent the R values are often better than the precision of the
>> actual
>> >> measurements (never true with macromolecular crystals involving
>> solvent),
>> >> perhaps our real problem might be modelling solvent?
>> >> Alternatively/additionally, I wonder whether there also might be more
>> >> variability molecule-to-molecule in proteins, which we may not model
>> well
>> >> either.
>> >>
>> >> JPK
>> >>
>> >> - Original Message - From: "George M. Sheldrick"
>> >> 
>> >> To: 
>> >> Sent: Thursday, October 28, 2010 4:05 AM
>> >> Subject: Re: [ccp4bb] Against Method (R)
>> >>
>> >>
>> >> > It is instructive to look at what happens for small molecules where
>> >> > there is often no solvent to worry about. They are often refined
>> >> > using SHELXL, which does indeed print out the weighted R-value based
>> >> > on intensities (wR2), the conventional unweighted R-value R1 (based
>> >> > on F) and /, which it calls R(sigma). For well-behaved
>> >> > crystals R1 is in the range 1-5% and R(merge) (based on intensities)
>> >> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be regarded
>> >> > as the lower attainable limit for R1 and this is indeed the case in
>> >> > practice (the factor 0.5 approximately converts from I to F). Rpim
>> >> > gives similar results to R(sigma), both attempt to measure the
>> >> > precision of the MERGED data, which are what one is refining against.
>> >> >
>> >> > George
>> >> >
>> >> > Prof. George M. Sheldrick FRS
>> >> > Dept. Structural Chemistry,
>> >> > University of Goettingen,
>> >> > Tammannstr. 4,
>> >> > D37077 Goettingen, Germany
>> >> > Tel. +49-551-39-3021 or -3068
>> >> > Fax. +49-551-39-22582
>> >> >
>> >> >
>> >> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
>> >> >
>> >> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
>> >> > > > the errors in our measurements apparently have no
>> >> > > >

Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Jacob Keller
So let's say I take a 0.6 Ang structure, artificially introduce noise into 
corresponding Fobs to make the resolution go down to 2 Ang, and refine using 
the 0.6 Ang model--do I actually get R's better than the artificially-inflated 
sigmas? Or let's say I experimentally decrease I/sigma by attenuating the beam 
and collect another data set--same situation?

JPK

  - Original Message - 
  From: Bart Hazes 
  To: CCP4BB@JISCMAIL.AC.UK 
  Sent: Thursday, October 28, 2010 4:13 PM
  Subject: Re: [ccp4bb] Against Method (R)


  There are many cases where people use a structure refined at high resolution 
as a starting molecular replacement structure for a closely related/same 
protein with a lower resolution data set and get substantially better R 
statistics than you would expect for that resolution. So one factor in the "R 
factor gap" is many small errors that are introduced during model building and 
not recognized and fixed later due to limited resolution. In a perfect world, 
refinement would find the global minimum but in practice all these little 
errors get stuck in local minima with distortions in neighboring atoms 
compensating for the initial error and thereby hiding their existence.

  Bart

  On 10-10-28 11:33 AM, James Holton wrote: 
It is important to remember that if you have Gaussian-distributed errors 
and you plot error bars between +1 sigma and -1 sigma (where "sigma" is the rms 
error), then you expect the "right" curve to miss the error bars about 30% of 
the time.  This is just a property of the Gaussian distribution: you expect a 
certain small number of the errors to be large.  If the curve passes within the 
bounds of every single one of your error bars, then your error estimates are 
either too big, or the errors have a non-Gaussian distribution.  

For example, if the noise in the data somehow had a uniform distribution 
(always between +1 and -1), then no data point will ever be "kicked" further 
than "1" away from the "right" curve.  In this case, a data point more than "1" 
away from the curve is evidence that you either have the wrong model (curve), 
or there is some other kind of noise around (wrong "error model").

As someone who has spent a lot of time looking into how we measure 
intensities, I think I can say with some considerable amount of confidence that 
we are doing a pretty good job of estimating the errors.  At least, they are 
certainly not off by an average of 40% (20% in F).  You could do better than 
that estimating the intensities by eye!

Everybody seems to have their own favorite explanation for what I call the 
"R factor gap": solvent, multi-confomer structures, absorption effects, etc.  
However, if you go through the literature (old and new) you will find countless 
attempts to include more sophisticated versions of each of these hypothetically 
"important" systematic errors, and in none of these cases has anyone ever 
presented a physically reasonable model that explained the observed spot 
intensities from a protein crystal to within experimental error.  Or at least, 
if there is such a paper, I haven't seen it.

Since there are so many possible things to "correct", what I would like to 
find is a structure that represents the transition between the "small molecule" 
and the "macromolecule" world.  Lysozyme does not qualify!  Even the famous 0.6 
A structure of lysozyme (2vb1) still has a "mean absolute chi": 
<|Iobs-Icalc|/sig(I)> = 4.5.  Also, the 1.4 A structure of the tetrapeptide 
QQNN (2olx) is only a little better at <|chi|> = 3.5.  I realize that the "chi" 
I describe here is not a "standard" crystallographic statistic, and perhaps I 
need a statistics lesson, but it seems to me there ought to be a case where it 
is close to 1.

-James Holton
MAD Scientist


On Thu, Oct 28, 2010 at 9:04 AM, Jacob Keller 
 wrote:

  So I guess there is never a case in crystallography in which our
  models predict the data to within the errors of data collection? I
  guess the situation might be similar to fitting a Michaelis-Menten
  curve, in which the fitted line often misses the error bars of the
  individual points, but gets the overall pattern right. In that case,
  though, I don't think we say that we are inadequately modelling the
  data. I guess there the error bars are actually too small (are
  underestimated.) Maybe our intensity errors are also underestimated?

  JPK


  On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick
   wrote:
  >
  > Not quite. I was trying to say that for good small molecule data, R1 is
  > usally significantly less than Rmerge, but never less than the precision
  > of the experimental data measured by 0.5*/ = 0.

Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Bart Hazes
blem might be modelling solvent?
>> Alternatively/additionally, I wonder whether there also might
be more
>> variability molecule-to-molecule in proteins, which we may not
model well
>> either.
>>
>> JPK
>>
>> - Original Message - From: "George M. Sheldrick"
>> <gshe...@shelx.uni-ac.gwdg.de>
>> To: <CCP4BB@JISCMAIL.AC.UK>
>> Sent: Thursday, October 28, 2010 4:05 AM
>> Subject: Re: [ccp4bb] Against Method (R)
>>
>>
>> > It is instructive to look at what happens for small
molecules where
>> > there is often no solvent to worry about. They are often
refined
>> > using SHELXL, which does indeed print out the weighted
R-value based
>> > on intensities (wR2), the conventional unweighted R-value
R1 (based
>> > on F) and /, which it calls
R(sigma). For well-behaved
>> > crystals R1 is in the range 1-5% and R(merge) (based on
intensities)
>> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could
be regarded
>> > as the lower attainable limit for R1 and this is indeed
the case in
>> > practice (the factor 0.5 approximately converts from I to
F). Rpim
>> > gives similar results to R(sigma), both attempt to
measure the
>> > precision of the MERGED data, which are what one is
refining against.
>> >
>> > George
>> >
>> > Prof. George M. Sheldrick FRS
>> > Dept. Structural Chemistry,
>> > University of Goettingen,
>> > Tammannstr. 4,
>> > D37077 Goettingen, Germany
>> > Tel. +49-551-39-3021 or -3068
>> > Fax. +49-551-39-22582
>> >
>> >
>> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
>> >
>> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft
wrote:
>> > > > the errors in our measurements apparently have
no
>> > > > bearing whatsoever on the errors in our models
>> > >
>> > > This would mean there is no point trying to get
better crystals, right?
>> > > Or am I also wrong to assume that the dataset with
higher I/sigma in the
>> > > highest resolution shell will give me a better model?
>> > >
>> > > On a related point - why is Rmerge considered to be
the limiting value
>> > > for the R?  Isn't Rmerge a poorly defined measure
itself that
>> > > deteriorates at least in some circumstances (e.g.
increased redundancy)?
>> > > Specifically, shouldn't "ideal" R approximate
0.5*/?
>> > >
>> > > Cheers,
>> > >
>> > > Ed.
>> > >
>> > >
>> > >
>> > > --
>> > > "I'd jump in myself, if I weren't so good at
whistling."
>> > >                                Julian, King of Lemurs
>> > >
>> > >
>>
>>
>> ***
>> Jacob Pearson Keller
>> Northwestern University
>> Medical Scientist Training Program
>> Dallos Laboratory
>> F. Searle 1-240
>> 2240 Campus Drive
>> Evanston IL 60208
>> lab: 847.491.2438
>> cel: 773.608.9185
>> email: j-kell...@northwestern.edu
>> ***
>>
>>
>


  
  
  


-- 



Bart Hazes (Associate Professor)
Dept. of Medical Microbiology & Immunology
University of Alberta
1-15 Medical Sciences Building
Edmonton, Alberta
Canada, T6G 2H7
phone:  1-780-492-0042
fax:1-780-492-7521







Re: [ccp4bb] Against Method (R)

2010-10-28 Thread James Holton
It is important to remember that if you have Gaussian-distributed errors and
you plot error bars between +1 sigma and -1 sigma (where "sigma" is the rms
error), then you expect the "right" curve to miss the error bars about 30%
of the time.  This is just a property of the Gaussian distribution: you
expect a certain small number of the errors to be large.  If the curve
passes within the bounds of every single one of your error bars, then your
error estimates are either too big, or the errors have a non-Gaussian
distribution.

For example, if the noise in the data somehow had a uniform distribution
(always between +1 and -1), then no data point will ever be "kicked" further
than "1" away from the "right" curve.  In this case, a data point more than
"1" away from the curve is evidence that you either have the wrong model
(curve), or there is some other kind of noise around (wrong "error model").

As someone who has spent a lot of time looking into how we measure
intensities, I think I can say with some considerable amount of confidence
that we are doing a pretty good job of estimating the errors.  At least,
they are certainly not off by an average of 40% (20% in F).  You could do
better than that estimating the intensities by eye!

Everybody seems to have their own favorite explanation for what I call the
"R factor gap": solvent, multi-confomer structures, absorption effects,
etc.  However, if you go through the literature (old and new) you will find
countless attempts to include more sophisticated versions of each of these
hypothetically "important" systematic errors, and in none of these cases has
anyone ever presented a physically reasonable model that explained the
observed spot intensities from a protein crystal to within experimental
error.  Or at least, if there is such a paper, I haven't seen it.

Since there are so many possible things to "correct", what I would like to
find is a structure that represents the transition between the "small
molecule" and the "macromolecule" world.  Lysozyme does not qualify!  Even
the famous 0.6 A structure of lysozyme (2vb1) still has a "mean absolute
chi": <|Iobs-Icalc|/sig(I)> = 4.5.  Also, the 1.4 A structure of the
tetrapeptide QQNN (2olx) is only a little better at <|chi|> = 3.5.  I
realize that the "chi" I describe here is not a "standard" crystallographic
statistic, and perhaps I need a statistics lesson, but it seems to me there
ought to be a case where it is close to 1.

-James Holton
MAD Scientist

On Thu, Oct 28, 2010 at 9:04 AM, Jacob Keller <
j-kell...@fsm.northwestern.edu> wrote:

> So I guess there is never a case in crystallography in which our
> models predict the data to within the errors of data collection? I
> guess the situation might be similar to fitting a Michaelis-Menten
> curve, in which the fitted line often misses the error bars of the
> individual points, but gets the overall pattern right. In that case,
> though, I don't think we say that we are inadequately modelling the
> data. I guess there the error bars are actually too small (are
> underestimated.) Maybe our intensity errors are also underestimated?
>
> JPK
>
> On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick
>  wrote:
> >
> > Not quite. I was trying to say that for good small molecule data, R1 is
> > usally significantly less than Rmerge, but never less than the precision
> > of the experimental data measured by 0.5*/ = 0.5*Rsigma
> > (or the very similar 0.5*Rpim).
> >
> > George
> >
> > Prof. George M. Sheldrick FRS
> > Dept. Structural Chemistry,
> > University of Goettingen,
> > Tammannstr. 4,
> > D37077 Goettingen, Germany
> > Tel. +49-551-39-3021 or -3068
> > Fax. +49-551-39-22582
> >
> >
> > On Thu, 28 Oct 2010, Jacob Keller wrote:
> >
> >> So I guess a consequence of what you say is that since in cases where
> there is
> >> no solvent the R values are often better than the precision of the
> actual
> >> measurements (never true with macromolecular crystals involving
> solvent),
> >> perhaps our real problem might be modelling solvent?
> >> Alternatively/additionally, I wonder whether there also might be more
> >> variability molecule-to-molecule in proteins, which we may not model
> well
> >> either.
> >>
> >> JPK
> >>
> >> - Original Message - From: "George M. Sheldrick"
> >> 
> >> To: 
> >> Sent: Thursday, October 28, 2010 4:05 AM
> >> Subject: Re: [ccp4bb] Against Method (R)
> >>
> >>
> >> > It is instructive to look at what happens for small molecu

Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Jacob Keller
So I guess there is never a case in crystallography in which our
models predict the data to within the errors of data collection? I
guess the situation might be similar to fitting a Michaelis-Menten
curve, in which the fitted line often misses the error bars of the
individual points, but gets the overall pattern right. In that case,
though, I don't think we say that we are inadequately modelling the
data. I guess there the error bars are actually too small (are
underestimated.) Maybe our intensity errors are also underestimated?

JPK

On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick
 wrote:
>
> Not quite. I was trying to say that for good small molecule data, R1 is
> usally significantly less than Rmerge, but never less than the precision
> of the experimental data measured by 0.5*/ = 0.5*Rsigma
> (or the very similar 0.5*Rpim).
>
> George
>
> Prof. George M. Sheldrick FRS
> Dept. Structural Chemistry,
> University of Goettingen,
> Tammannstr. 4,
> D37077 Goettingen, Germany
> Tel. +49-551-39-3021 or -3068
> Fax. +49-551-39-22582
>
>
> On Thu, 28 Oct 2010, Jacob Keller wrote:
>
>> So I guess a consequence of what you say is that since in cases where there 
>> is
>> no solvent the R values are often better than the precision of the actual
>> measurements (never true with macromolecular crystals involving solvent),
>> perhaps our real problem might be modelling solvent?
>> Alternatively/additionally, I wonder whether there also might be more
>> variability molecule-to-molecule in proteins, which we may not model well
>> either.
>>
>> JPK
>>
>> ----- Original Message - From: "George M. Sheldrick"
>> 
>> To: 
>> Sent: Thursday, October 28, 2010 4:05 AM
>> Subject: Re: [ccp4bb] Against Method (R)
>>
>>
>> > It is instructive to look at what happens for small molecules where
>> > there is often no solvent to worry about. They are often refined
>> > using SHELXL, which does indeed print out the weighted R-value based
>> > on intensities (wR2), the conventional unweighted R-value R1 (based
>> > on F) and /, which it calls R(sigma). For well-behaved
>> > crystals R1 is in the range 1-5% and R(merge) (based on intensities)
>> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be regarded
>> > as the lower attainable limit for R1 and this is indeed the case in
>> > practice (the factor 0.5 approximately converts from I to F). Rpim
>> > gives similar results to R(sigma), both attempt to measure the
>> > precision of the MERGED data, which are what one is refining against.
>> >
>> > George
>> >
>> > Prof. George M. Sheldrick FRS
>> > Dept. Structural Chemistry,
>> > University of Goettingen,
>> > Tammannstr. 4,
>> > D37077 Goettingen, Germany
>> > Tel. +49-551-39-3021 or -3068
>> > Fax. +49-551-39-22582
>> >
>> >
>> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
>> >
>> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
>> > > > the errors in our measurements apparently have no
>> > > > bearing whatsoever on the errors in our models
>> > >
>> > > This would mean there is no point trying to get better crystals, right?
>> > > Or am I also wrong to assume that the dataset with higher I/sigma in the
>> > > highest resolution shell will give me a better model?
>> > >
>> > > On a related point - why is Rmerge considered to be the limiting value
>> > > for the R?  Isn't Rmerge a poorly defined measure itself that
>> > > deteriorates at least in some circumstances (e.g. increased redundancy)?
>> > > Specifically, shouldn't "ideal" R approximate 0.5*/?
>> > >
>> > > Cheers,
>> > >
>> > > Ed.
>> > >
>> > >
>> > >
>> > > --
>> > > "I'd jump in myself, if I weren't so good at whistling."
>> > >                                Julian, King of Lemurs
>> > >
>> > >
>>
>>
>> ***
>> Jacob Pearson Keller
>> Northwestern University
>> Medical Scientist Training Program
>> Dallos Laboratory
>> F. Searle 1-240
>> 2240 Campus Drive
>> Evanston IL 60208
>> lab: 847.491.2438
>> cel: 773.608.9185
>> email: j-kell...@northwestern.edu
>> ***
>>
>>
>


Re: [ccp4bb] Against Method (R)

2010-10-28 Thread George M. Sheldrick
Not quite. I was trying to say that for good small molecule data, R1 is 
usally significantly less than Rmerge, but never less than the precision
of the experimental data measured by 0.5*/ = 0.5*Rsigma 
(or the very similar 0.5*Rpim).

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Thu, 28 Oct 2010, Jacob Keller wrote:

> So I guess a consequence of what you say is that since in cases where there is
> no solvent the R values are often better than the precision of the actual
> measurements (never true with macromolecular crystals involving solvent),
> perhaps our real problem might be modelling solvent?
> Alternatively/additionally, I wonder whether there also might be more
> variability molecule-to-molecule in proteins, which we may not model well
> either.
> 
> JPK
> 
> - Original Message - From: "George M. Sheldrick"
> 
> To: 
> Sent: Thursday, October 28, 2010 4:05 AM
> Subject: Re: [ccp4bb] Against Method (R)
> 
> 
> > It is instructive to look at what happens for small molecules where
> > there is often no solvent to worry about. They are often refined
> > using SHELXL, which does indeed print out the weighted R-value based
> > on intensities (wR2), the conventional unweighted R-value R1 (based
> > on F) and /, which it calls R(sigma). For well-behaved
> > crystals R1 is in the range 1-5% and R(merge) (based on intensities)
> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be regarded
> > as the lower attainable limit for R1 and this is indeed the case in
> > practice (the factor 0.5 approximately converts from I to F). Rpim
> > gives similar results to R(sigma), both attempt to measure the
> > precision of the MERGED data, which are what one is refining against.
> >
> > George
> >
> > Prof. George M. Sheldrick FRS
> > Dept. Structural Chemistry,
> > University of Goettingen,
> > Tammannstr. 4,
> > D37077 Goettingen, Germany
> > Tel. +49-551-39-3021 or -3068
> > Fax. +49-551-39-22582
> >
> >
> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
> >
> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
> > > > the errors in our measurements apparently have no
> > > > bearing whatsoever on the errors in our models
> > >
> > > This would mean there is no point trying to get better crystals, right?
> > > Or am I also wrong to assume that the dataset with higher I/sigma in the
> > > highest resolution shell will give me a better model?
> > >
> > > On a related point - why is Rmerge considered to be the limiting value
> > > for the R?  Isn't Rmerge a poorly defined measure itself that
> > > deteriorates at least in some circumstances (e.g. increased redundancy)?
> > > Specifically, shouldn't "ideal" R approximate 0.5*/?
> > >
> > > Cheers,
> > >
> > > Ed.
> > >
> > >
> > >
> > > -- 
> > > "I'd jump in myself, if I weren't so good at whistling."
> > >Julian, King of Lemurs
> > >
> > >
> 
> 
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> Dallos Laboratory
> F. Searle 1-240
> 2240 Campus Drive
> Evanston IL 60208
> lab: 847.491.2438
> cel: 773.608.9185
> email: j-kell...@northwestern.edu
> ***
> 
> 


Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Ed Pozharski
In addition to bulk solvent, the other well recognized problem with
macromolecular structures is the inadequate description of disorder.
With small molecules, the Debye-Waller works much better because the
harmonic oscillator is indeed a good model there.  Note that the problem
is not anisotropy (which we can model if resolution is sufficiently
high), but rather anharmonic motion and multiple conformations that go
undetected.

On Thu, 2010-10-28 at 08:00 -0500, Jacob Keller wrote:
> So I guess a consequence of what you say is that since in cases where there 
> is no solvent the R values are often better than the precision of the actual 
> measurements (never true with macromolecular crystals involving solvent), 
> perhaps our real problem might be modelling solvent? 
> Alternatively/additionally, I wonder whether there also might be more 
> variability molecule-to-molecule in proteins, which we may not model well 
> either.
> 
> JPK
> 
> - Original Message - 
> From: "George M. Sheldrick" 
> To: 
> Sent: Thursday, October 28, 2010 4:05 AM
> Subject: Re: [ccp4bb] Against Method (R)
> 
> 
> > It is instructive to look at what happens for small molecules where
> > there is often no solvent to worry about. They are often refined
> > using SHELXL, which does indeed print out the weighted R-value based
> > on intensities (wR2), the conventional unweighted R-value R1 (based
> > on F) and /, which it calls R(sigma). For well-behaved
> > crystals R1 is in the range 1-5% and R(merge) (based on intensities)
> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be regarded
> > as the lower attainable limit for R1 and this is indeed the case in
> > practice (the factor 0.5 approximately converts from I to F). Rpim
> > gives similar results to R(sigma), both attempt to measure the
> > precision of the MERGED data, which are what one is refining against.
> >
> > George
> >
> > Prof. George M. Sheldrick FRS
> > Dept. Structural Chemistry,
> > University of Goettingen,
> > Tammannstr. 4,
> > D37077 Goettingen, Germany
> > Tel. +49-551-39-3021 or -3068
> > Fax. +49-551-39-22582
> >
> >
> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
> >
> >> On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
> >> > the errors in our measurements apparently have no
> >> > bearing whatsoever on the errors in our models
> >>
> >> This would mean there is no point trying to get better crystals, right?
> >> Or am I also wrong to assume that the dataset with higher I/sigma in the
> >> highest resolution shell will give me a better model?
> >>
> >> On a related point - why is Rmerge considered to be the limiting value
> >> for the R?  Isn't Rmerge a poorly defined measure itself that
> >> deteriorates at least in some circumstances (e.g. increased redundancy)?
> >> Specifically, shouldn't "ideal" R approximate 0.5*/?
> >>
> >> Cheers,
> >>
> >> Ed.
> >>
> >>
> >>
> >> -- 
> >> "I'd jump in myself, if I weren't so good at whistling."
> >>Julian, King of Lemurs
> >>
> >>
> 
> 
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> Dallos Laboratory
> F. Searle 1-240
> 2240 Campus Drive
> Evanston IL 60208
> lab: 847.491.2438
> cel: 773.608.9185
> email: j-kell...@northwestern.edu
> ***

-- 
"I'd jump in myself, if I weren't so good at whistling."
   Julian, King of Lemurs


Re: [ccp4bb] Against Method (R)

2010-10-28 Thread Jacob Keller
So I guess a consequence of what you say is that since in cases where there 
is no solvent the R values are often better than the precision of the actual 
measurements (never true with macromolecular crystals involving solvent), 
perhaps our real problem might be modelling solvent? 
Alternatively/additionally, I wonder whether there also might be more 
variability molecule-to-molecule in proteins, which we may not model well 
either.


JPK

- Original Message - 
From: "George M. Sheldrick" 

To: 
Sent: Thursday, October 28, 2010 4:05 AM
Subject: Re: [ccp4bb] Against Method (R)



It is instructive to look at what happens for small molecules where
there is often no solvent to worry about. They are often refined
using SHELXL, which does indeed print out the weighted R-value based
on intensities (wR2), the conventional unweighted R-value R1 (based
on F) and /, which it calls R(sigma). For well-behaved
crystals R1 is in the range 1-5% and R(merge) (based on intensities)
is in the range 3-9%. As you suggest, 0.5*R(sigma) could be regarded
as the lower attainable limit for R1 and this is indeed the case in
practice (the factor 0.5 approximately converts from I to F). Rpim
gives similar results to R(sigma), both attempt to measure the
precision of the MERGED data, which are what one is refining against.

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Wed, 27 Oct 2010, Ed Pozharski wrote:


On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
> the errors in our measurements apparently have no
> bearing whatsoever on the errors in our models

This would mean there is no point trying to get better crystals, right?
Or am I also wrong to assume that the dataset with higher I/sigma in the
highest resolution shell will give me a better model?

On a related point - why is Rmerge considered to be the limiting value
for the R?  Isn't Rmerge a poorly defined measure itself that
deteriorates at least in some circumstances (e.g. increased redundancy)?
Specifically, shouldn't "ideal" R approximate 0.5*/?

Cheers,

Ed.



--
"I'd jump in myself, if I weren't so good at whistling."
   Julian, King of Lemurs





***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
Dallos Laboratory
F. Searle 1-240
2240 Campus Drive
Evanston IL 60208
lab: 847.491.2438
cel: 773.608.9185
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] Against Method (R)

2010-10-28 Thread George M. Sheldrick
It is instructive to look at what happens for small molecules where
there is often no solvent to worry about. They are often refined 
using SHELXL, which does indeed print out the weighted R-value based 
on intensities (wR2), the conventional unweighted R-value R1 (based 
on F) and /, which it calls R(sigma). For well-behaved
crystals R1 is in the range 1-5% and R(merge) (based on intensities)
is in the range 3-9%. As you suggest, 0.5*R(sigma) could be regarded
as the lower attainable limit for R1 and this is indeed the case in 
practice (the factor 0.5 approximately converts from I to F). Rpim
gives similar results to R(sigma), both attempt to measure the
precision of the MERGED data, which are what one is refining against.

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


On Wed, 27 Oct 2010, Ed Pozharski wrote:

> On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
> > the errors in our measurements apparently have no 
> > bearing whatsoever on the errors in our models 
> 
> This would mean there is no point trying to get better crystals, right?
> Or am I also wrong to assume that the dataset with higher I/sigma in the
> highest resolution shell will give me a better model?
> 
> On a related point - why is Rmerge considered to be the limiting value
> for the R?  Isn't Rmerge a poorly defined measure itself that
> deteriorates at least in some circumstances (e.g. increased redundancy)?
> Specifically, shouldn't "ideal" R approximate 0.5*/?
> 
> Cheers,
> 
> Ed.
> 
> 
> 
> -- 
> "I'd jump in myself, if I weren't so good at whistling."
>Julian, King of Lemurs
> 
> 


Re: [ccp4bb] Against Method (R)

2010-10-27 Thread Ethan A Merritt

On Wed, 27 Oct 2010, Frank von Delft wrote:


So, since the experimental error is only a minor contribution to the total
error, it is arguably inappropriate to use it as a weight for each hkl.

I think your logic has run off the track.  The experimental error is an
appropriate weight for the Fobs(hkl) because that is indeed the error
for that observation.  This is true independent of errors in the model.
If you improve the model, that does not magically change the accuracy
of the data.

Sorry, still missing something:

In the weighted Rfactor, we're weighting by the 1/sig**2 (right?)  And the 
reason for that is, presumably, that when we add a term (Fo-Fc) but the Fo is 
crap (huge sigma), we need to ensure we don't add very much of it -- so we 
divide the term by the huge sigma.


Correct.

But what if Fc also is crap?  Which it patently is:  it's not even within 20% 
of Fo, never mind vaguely within sig(Fo).  Why should we not be 
down-weighting those terms as well?


Because here we want the exact opposite.  If Fc is hugely different from a
well-measured Fobs then it is a sensitive indicator of a problem with the model.
Why would we want to down-weight it? 
Consider the extreme case:  if you down-weight all reflections for which Fc does

not already agree with Fo, then you will always conclude that the current model,
no matter what random drawer you pulled it from, is in fine shape.

Or can we ignore that because, since all terms are crap, we'd simply be 
down-weighting the entire Rw by a lot, and we'd be doing it for the Rw of 
both models we're comparing, so they'd cancel out when we take the ratio 
Rw1/Rw2?


Not sure I follow this.

But if we're so happy to fudge away the huge gorilla in the room, why would 
we need to be religious about the little gnats on the floor (the sig(Fo))? 
Is there then really a difference between R1/R2 and Rw1/Rw2, for all 
practical purposes?


Yes.  That was the message of the 1970 Ford & Rollet paper that Ian provided a
link for.

 Ethan


(Of course, this is all for the ongoing case we don't know how to model the 
R-factor gap.  And no, I haven't played with actual numbers...)


phx.



Re: [ccp4bb] Against Method (R)

2010-10-27 Thread Ed Pozharski
On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
> the errors in our measurements apparently have no 
> bearing whatsoever on the errors in our models 

This would mean there is no point trying to get better crystals, right?
Or am I also wrong to assume that the dataset with higher I/sigma in the
highest resolution shell will give me a better model?

On a related point - why is Rmerge considered to be the limiting value
for the R?  Isn't Rmerge a poorly defined measure itself that
deteriorates at least in some circumstances (e.g. increased redundancy)?
Specifically, shouldn't "ideal" R approximate 0.5*/?

Cheers,

Ed.



-- 
"I'd jump in myself, if I weren't so good at whistling."
   Julian, King of Lemurs


Re: [ccp4bb] Against Method (R)

2010-10-27 Thread Frank von Delft

Yes, but what I think Frank is trying to point out is that the difference
between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
larger than sigma(Fobs).  In such situations, pretty much any standard
statistical test will tell you that the model is "highly unlikely to be
correct".

But that's not the question we are normally asking.
It is highly unlikely that any model in biology is correct, if by "correct"
you mean "cannot be improved". Normally we ask the more modest question
"have I improved my model today over what it was yesterday?".


I am not saying that everything in the PDB is "wrong", just that the
dominant source of error is a shortcoming of the models we use.  Whatever
this "source of error" is, it vastly overpowers the measurement error.  That
is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~
20%^2 .

So, since the experimental error is only a minor contribution to the total
error, it is arguably inappropriate to use it as a weight for each hkl.

I think your logic has run off the track.  The experimental error is an
appropriate weight for the Fobs(hkl) because that is indeed the error
for that observation.  This is true independent of errors in the model.
If you improve the model, that does not magically change the accuracy
of the data.

Sorry, still missing something:

In the weighted Rfactor, we're weighting by the 1/sig**2 (right?)  And 
the reason for that is, presumably, that when we add a term (Fo-Fc) but 
the Fo is crap (huge sigma), we need to ensure we don't add very much of 
it -- so we divide the term by the huge sigma.


But what if Fc also is crap?  Which it patently is:  it's not even 
within 20% of Fo, never mind vaguely within sig(Fo).  Why should we not 
be down-weighting those terms as well?


Or can we ignore that because, since all terms are crap, we'd simply be 
down-weighting the entire Rw by a lot, and we'd be doing it for the Rw 
of both models we're comparing, so they'd cancel out when we take the 
ratio Rw1/Rw2?


But if we're so happy to fudge away the huge gorilla in the room, why 
would we need to be religious about the little gnats on the floor (the 
sig(Fo))?  Is there then really a difference between R1/R2 and Rw1/Rw2, 
for all practical purposes?


(Of course, this is all for the ongoing case we don't know how to model 
the R-factor gap.  And no, I haven't played with actual numbers...)


phx.


Re: [ccp4bb] Against Method (R)

2010-10-26 Thread James Holton
Some time ago, I computed the mean value of Rcryst(F) / Rmerge(F) across the
whole PDB.  This average was 4.5, and I take this as a rough estimate of
|Fcalc - Fobs| / sigma(Fobs).  More recently, I have been looking in more
detail at deposited data, but so far the few cases where this ratio is close
to 1 are all cases where sigma(Fobs) is unusually high!

I think the "answer" is that we can believe structures in the PDB to "within
20% error".  This is "close enough" for a few things (such as government
work), but not for traditional statistics like "confidence tests".  For me,
it is just really bothersome that we can measure structure factors to better
than 5% accuracy, but still don't know how to model them.

Ethan does make a good point that sig(Fobs) is the error in the measurement,
and that the model-data error is not the weight one should use in
refinement, etc.  However, when you are comparing one PDB entry (yours) to
others (published), I still don't think that sigma(Fobs) plays any
significant role.

-James Holton
MAD Scientist

On Tue, Oct 26, 2010 at 4:45 PM, Jacob Keller <
j-kell...@fsm.northwestern.edu> wrote:

>  - Original Message -
> *From:* James Holton 
> *To:* CCP4BB@JISCMAIL.AC.UK
> *Sent:* Tuesday, October 26, 2010 6:31 PM
> *Subject:* Re: [ccp4bb] Against Method (R)
>
> Yes, but what I think Frank is trying to point out is that the difference
> between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
> larger than sigma(Fobs).  In such situations, pretty much any standard
> statistical test will tell you that the model is "highly unlikely to be
> correct".
>
> Wow, so what is the answer to this? Is that figure "|Fcalc - Fobs| = 4-5x
> sigma" really true? How, then, do we believe structures? Are there really
> good structures where this is discrepancy is not there, to "stake our
> claim," so to speak?
>


Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ethan Merritt
On Tuesday, October 26, 2010 04:31:24 pm James Holton wrote:
> Yes, but what I think Frank is trying to point out is that the difference
> between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
> larger than sigma(Fobs).  In such situations, pretty much any standard
> statistical test will tell you that the model is "highly unlikely to be
> correct".

But that's not the question we are normally asking.
It is highly unlikely that any model in biology is correct, if by "correct" 
you mean "cannot be improved". Normally we ask the more modest question
"have I improved my model today over what it was yesterday?".

> I am not saying that everything in the PDB is "wrong", just that the
> dominant source of error is a shortcoming of the models we use.  Whatever
> this "source of error" is, it vastly overpowers the measurement error.  That
> is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~
> 20%^2 .
> 
> So, since the experimental error is only a minor contribution to the total
> error, it is arguably inappropriate to use it as a weight for each hkl.

I think your logic has run off the track.  The experimental error is an
appropriate weight for the Fobs(hkl) because that is indeed the error
for that observation.  This is true independent of errors in the model.
If you improve the model, that does not magically change the accuracy
of the data.

Ethan


 
> Yes, refinement does seem to work better when you use experimental sigmas,
> and weighted statistics are probably "better" than no weights at all, but
> the problem is that until we do have a model that can explain Fobs to within
> experimental error, we will be severely limited in the kinds of conclusions
> we can derive from our data.
> 
> -James Holton
> MAD Scientist
> 
> On Tue, Oct 26, 2010 at 1:59 PM, Ethan Merritt 
> wrote:
> 
> > On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote:
> > >Um...
> > >
> > > * Given that the weighted Rfactor is weighted by the measurement errors
> > > (1/sig^2)
> > >
> > > * and given that the errors in our measurements apparently have no
> > > bearing whatsoever on the errors in our models (for macromolecular
> > > crystals, certainly - the "R-vfactor gap")
> >
> > You are overlooking causality :-)
> >
> > Yes, the errors in state-of-the-art models are only weakly limited by the
> > errors in our measurements.  But that is exactly _because_ we can now
> > weight
> > properly by the measurement errors (1/sig^2).  In my salad days,
> > weighting by 1/sig^2 was a mug's game.   Refinement only produced
> > a reasonable model if you applied empirical corrections rather than
> > statistical weights.  Things have improved a bit since then,
> > both on the equipment side (detectors, cryo, ...) and on the processing
> > side (Maximum Likelihood, error propagation, ...).
> > Now the sigmas actually mean something!
> >
> > > is the weighted Rfactor even vaguely relevant for anything at all?
> >
> > Yes, it is.  It is the thing you are minimizing during refinement,
> > at least to first approximation.  Also, as just mentioned, it is a
> > well-defined value that you can do use for statistical significance
> > tests.
> >
> >Ethan
> >
> >
> > >
> > > phx.
> > >
> > >
> > >
> > > On 26/10/2010 20:44, Ian Tickle wrote:
> > > > Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .
> > > >
> > > > The Rfree/Rwork ratio that I referred to does strictly use the
> > > > weighted ('Hamilton') R-factors, but because only the unweighted
> > > > values are given in the PDB we were forced to approximate (against our
> > > > better judgment!).
> > > >
> > > > The problem of course is that all refinement software AFAIK writes the
> > > > unweighted Rwork&  Rfree to the PDB header; there are no slots for the
> > > > weighted values, which does indeed make doing serious statistics on
> > > > the PDB entries difficult if not impossible!
> > > >
> > > > The unweighted crystallographic R-factor was only ever intended as a
> > > > "rule of thumb", i.e. to give a rough idea of the relative quality of
> > > > related structures; I hardly think the crystallographers of yesteryear
> > > > ever imagined that we would be taking it so seriously now!
> > > >
> > > > In particular IMO it should never be used for something as critical as
> > > > validation (either global or local), or for guiding refinement
> > > > strategy: use the likelihood instead.
> > > >
> > > > Cheers
> > > >
> > > > -- Ian
> > > >
> > > > PS I've always known it as an 'R-factor', e.g. see paper referenced
> > > > above, but then during my crystallographic training I used extensively
> > > > software developed by both authors of the paper (i.e. Geoff Ford&  the
> > > > late John Rollett) in Oxford (which eventually became the 'Crystals'
> > > > small-molecule package).  Maybe it's a transatlantic thing ...
> > > >
> > > > Cheers
> > > >
> > > > -- Ian
> > > >
> > > > On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt<

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Jacob Keller
  - Original Message - 
  From: James Holton 
  To: CCP4BB@JISCMAIL.AC.UK 
  Sent: Tuesday, October 26, 2010 6:31 PM
  Subject: Re: [ccp4bb] Against Method (R)


  Yes, but what I think Frank is trying to point out is that the difference 
between Fobs and Fcalc in any given PDB entry is generally about 4-5 times 
larger than sigma(Fobs).  In such situations, pretty much any standard 
statistical test will tell you that the model is "highly unlikely to be 
correct".

Wow, so what is the answer to this? Is that figure "|Fcalc - Fobs| = 4-5x 
sigma" really true? How, then, do we believe structures? Are there really good 
structures where this is discrepancy is not there, to "stake our claim," so to 
speak?

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread James Holton
Yes, but what I think Frank is trying to point out is that the difference
between Fobs and Fcalc in any given PDB entry is generally about 4-5 times
larger than sigma(Fobs).  In such situations, pretty much any standard
statistical test will tell you that the model is "highly unlikely to be
correct".

I am not saying that everything in the PDB is "wrong", just that the
dominant source of error is a shortcoming of the models we use.  Whatever
this "source of error" is, it vastly overpowers the measurement error.  That
is, errors do not add linearly, but rather as squares, and 20%^2+5%^2 ~
20%^2 .

So, since the experimental error is only a minor contribution to the total
error, it is arguably inappropriate to use it as a weight for each hkl.

Yes, refinement does seem to work better when you use experimental sigmas,
and weighted statistics are probably "better" than no weights at all, but
the problem is that until we do have a model that can explain Fobs to within
experimental error, we will be severely limited in the kinds of conclusions
we can derive from our data.

-James Holton
MAD Scientist

On Tue, Oct 26, 2010 at 1:59 PM, Ethan Merritt wrote:

> On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote:
> >Um...
> >
> > * Given that the weighted Rfactor is weighted by the measurement errors
> > (1/sig^2)
> >
> > * and given that the errors in our measurements apparently have no
> > bearing whatsoever on the errors in our models (for macromolecular
> > crystals, certainly - the "R-vfactor gap")
>
> You are overlooking causality :-)
>
> Yes, the errors in state-of-the-art models are only weakly limited by the
> errors in our measurements.  But that is exactly _because_ we can now
> weight
> properly by the measurement errors (1/sig^2).  In my salad days,
> weighting by 1/sig^2 was a mug's game.   Refinement only produced
> a reasonable model if you applied empirical corrections rather than
> statistical weights.  Things have improved a bit since then,
> both on the equipment side (detectors, cryo, ...) and on the processing
> side (Maximum Likelihood, error propagation, ...).
> Now the sigmas actually mean something!
>
> > is the weighted Rfactor even vaguely relevant for anything at all?
>
> Yes, it is.  It is the thing you are minimizing during refinement,
> at least to first approximation.  Also, as just mentioned, it is a
> well-defined value that you can do use for statistical significance
> tests.
>
>Ethan
>
>
> >
> > phx.
> >
> >
> >
> > On 26/10/2010 20:44, Ian Tickle wrote:
> > > Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .
> > >
> > > The Rfree/Rwork ratio that I referred to does strictly use the
> > > weighted ('Hamilton') R-factors, but because only the unweighted
> > > values are given in the PDB we were forced to approximate (against our
> > > better judgment!).
> > >
> > > The problem of course is that all refinement software AFAIK writes the
> > > unweighted Rwork&  Rfree to the PDB header; there are no slots for the
> > > weighted values, which does indeed make doing serious statistics on
> > > the PDB entries difficult if not impossible!
> > >
> > > The unweighted crystallographic R-factor was only ever intended as a
> > > "rule of thumb", i.e. to give a rough idea of the relative quality of
> > > related structures; I hardly think the crystallographers of yesteryear
> > > ever imagined that we would be taking it so seriously now!
> > >
> > > In particular IMO it should never be used for something as critical as
> > > validation (either global or local), or for guiding refinement
> > > strategy: use the likelihood instead.
> > >
> > > Cheers
> > >
> > > -- Ian
> > >
> > > PS I've always known it as an 'R-factor', e.g. see paper referenced
> > > above, but then during my crystallographic training I used extensively
> > > software developed by both authors of the paper (i.e. Geoff Ford&  the
> > > late John Rollett) in Oxford (which eventually became the 'Crystals'
> > > small-molecule package).  Maybe it's a transatlantic thing ...
> > >
> > > Cheers
> > >
> > > -- Ian
> > >
> > > On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt<
> merr...@u.washington.edu>  wrote:
> > >> On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat
> a.D.) wrote:
> > >>> Hi Folks,
> > >>>
> > >>> Please allow me a few biased reflections/opinions on the numeRology
> of the
> > >>> R-value (not R-factor, because it is neither a factor itself nor does
> it
> > >>> factor in anything but ill-posed reviewer's critique. Historically
> the term
> > >>> originated from small molecule crystallography, but it is only a
> > >>> 'Residual-value')
> > >>>
> > >>> a) The R-value itself - based on the linear residuals and of apparent
> > >>> intuitive meaning - is statistically peculiar to say the least. I
> could not
> > >>> find it in any common statistics text. So doing proper statistics
> with R
> > >>> becomes difficult.
> > >> As WC Hamilton pointed out originally, two [p

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ethan Merritt
On Tuesday, October 26, 2010 01:16:58 pm Frank von Delft wrote:
>Um...
> 
> * Given that the weighted Rfactor is weighted by the measurement errors 
> (1/sig^2)
> 
> * and given that the errors in our measurements apparently have no 
> bearing whatsoever on the errors in our models (for macromolecular 
> crystals, certainly - the "R-vfactor gap")

You are overlooking causality :-)

Yes, the errors in state-of-the-art models are only weakly limited by the
errors in our measurements.  But that is exactly _because_ we can now weight
properly by the measurement errors (1/sig^2).  In my salad days,
weighting by 1/sig^2 was a mug's game.   Refinement only produced
a reasonable model if you applied empirical corrections rather than
statistical weights.  Things have improved a bit since then,
both on the equipment side (detectors, cryo, ...) and on the processing
side (Maximum Likelihood, error propagation, ...).  
Now the sigmas actually mean something!

> is the weighted Rfactor even vaguely relevant for anything at all?

Yes, it is.  It is the thing you are minimizing during refinement,
at least to first approximation.  Also, as just mentioned, it is a
well-defined value that you can do use for statistical significance
tests. 

Ethan


> 
> phx.
> 
> 
> 
> On 26/10/2010 20:44, Ian Tickle wrote:
> > Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .
> >
> > The Rfree/Rwork ratio that I referred to does strictly use the
> > weighted ('Hamilton') R-factors, but because only the unweighted
> > values are given in the PDB we were forced to approximate (against our
> > better judgment!).
> >
> > The problem of course is that all refinement software AFAIK writes the
> > unweighted Rwork&  Rfree to the PDB header; there are no slots for the
> > weighted values, which does indeed make doing serious statistics on
> > the PDB entries difficult if not impossible!
> >
> > The unweighted crystallographic R-factor was only ever intended as a
> > "rule of thumb", i.e. to give a rough idea of the relative quality of
> > related structures; I hardly think the crystallographers of yesteryear
> > ever imagined that we would be taking it so seriously now!
> >
> > In particular IMO it should never be used for something as critical as
> > validation (either global or local), or for guiding refinement
> > strategy: use the likelihood instead.
> >
> > Cheers
> >
> > -- Ian
> >
> > PS I've always known it as an 'R-factor', e.g. see paper referenced
> > above, but then during my crystallographic training I used extensively
> > software developed by both authors of the paper (i.e. Geoff Ford&  the
> > late John Rollett) in Oxford (which eventually became the 'Crystals'
> > small-molecule package).  Maybe it's a transatlantic thing ...
> >
> > Cheers
> >
> > -- Ian
> >
> > On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt  
> > wrote:
> >> On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat 
> >> a.D.) wrote:
> >>> Hi Folks,
> >>>
> >>> Please allow me a few biased reflections/opinions on the numeRology of the
> >>> R-value (not R-factor, because it is neither a factor itself nor does it
> >>> factor in anything but ill-posed reviewer's critique. Historically the 
> >>> term
> >>> originated from small molecule crystallography, but it is only a
> >>> 'Residual-value')
> >>>
> >>> a) The R-value itself - based on the linear residuals and of apparent
> >>> intuitive meaning - is statistically peculiar to say the least. I could 
> >>> not
> >>> find it in any common statistics text. So doing proper statistics with R
> >>> becomes difficult.
> >> As WC Hamilton pointed out originally, two [properly weighted] R factors 
> >> can
> >> be compared by taking their ratio.  Significance levels can then be 
> >> evaluated
> >> using the standard F distribution.  A concise summary is given in chapter 9
> >> of Prince's book, which I highly recommend to all crystallographers.
> >>
> >> W C Hamilton "Significance tests on the crystallographic R factor"
> >> Acta Cryst. (1965). 18, 502-510
> >>
> >> Edward Prince "Mathematical Techniques in Crystallography and Materials
> >> Science". Springer-Verlag, 1982.
> >>
> >> It is true that we normally indulge in the sloppy habit of paying attention
> >> only to the unweighted R factor even though refinement programs report
> >> both the weighted and unweighted versions.  (shelx users excepted :-)
> >> But the weighted form is there also if you want to do statistical tests.
> >>
> >> You are of course correct that this remains a global test, and as such
> >> is of limited use in evaluating local properties of the model.
> >>
> >> cheers,
> >>
> >> Ethan
> >>
> >>
> >>
> >>
> >>> b) rules of thumb (as much as they conveniently obviate the need for
> >>> detailed explanations, satisfy student's desire for quick answers,  and
> >>> allow superficial review of manuscripts) become less valuable if they 
> >>> have a
> >>> case-dependent large variance, topp

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Frank von Delft

  Um...

* Given that the weighted Rfactor is weighted by the measurement errors 
(1/sig^2)


* and given that the errors in our measurements apparently have no 
bearing whatsoever on the errors in our models (for macromolecular 
crystals, certainly - the "R-vfactor gap")


is the weighted Rfactor even vaguely relevant for anything at all?

phx.



On 26/10/2010 20:44, Ian Tickle wrote:

Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .

The Rfree/Rwork ratio that I referred to does strictly use the
weighted ('Hamilton') R-factors, but because only the unweighted
values are given in the PDB we were forced to approximate (against our
better judgment!).

The problem of course is that all refinement software AFAIK writes the
unweighted Rwork&  Rfree to the PDB header; there are no slots for the
weighted values, which does indeed make doing serious statistics on
the PDB entries difficult if not impossible!

The unweighted crystallographic R-factor was only ever intended as a
"rule of thumb", i.e. to give a rough idea of the relative quality of
related structures; I hardly think the crystallographers of yesteryear
ever imagined that we would be taking it so seriously now!

In particular IMO it should never be used for something as critical as
validation (either global or local), or for guiding refinement
strategy: use the likelihood instead.

Cheers

-- Ian

PS I've always known it as an 'R-factor', e.g. see paper referenced
above, but then during my crystallographic training I used extensively
software developed by both authors of the paper (i.e. Geoff Ford&  the
late John Rollett) in Oxford (which eventually became the 'Crystals'
small-molecule package).  Maybe it's a transatlantic thing ...

Cheers

-- Ian

On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt  wrote:

On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) 
wrote:

Hi Folks,

Please allow me a few biased reflections/opinions on the numeRology of the
R-value (not R-factor, because it is neither a factor itself nor does it
factor in anything but ill-posed reviewer's critique. Historically the term
originated from small molecule crystallography, but it is only a
'Residual-value')

a) The R-value itself - based on the linear residuals and of apparent
intuitive meaning - is statistically peculiar to say the least. I could not
find it in any common statistics text. So doing proper statistics with R
becomes difficult.

As WC Hamilton pointed out originally, two [properly weighted] R factors can
be compared by taking their ratio.  Significance levels can then be evaluated
using the standard F distribution.  A concise summary is given in chapter 9
of Prince's book, which I highly recommend to all crystallographers.

W C Hamilton "Significance tests on the crystallographic R factor"
Acta Cryst. (1965). 18, 502-510

Edward Prince "Mathematical Techniques in Crystallography and Materials
Science". Springer-Verlag, 1982.

It is true that we normally indulge in the sloppy habit of paying attention
only to the unweighted R factor even though refinement programs report
both the weighted and unweighted versions.  (shelx users excepted :-)
But the weighted form is there also if you want to do statistical tests.

You are of course correct that this remains a global test, and as such
is of limited use in evaluating local properties of the model.

cheers,

Ethan





b) rules of thumb (as much as they conveniently obviate the need for
detailed explanations, satisfy student's desire for quick answers,  and
allow superficial review of manuscripts) become less valuable if they have a
case-dependent large variance, topped with an unknown parent distribution.
Combined with an odd statistic, that has great potential for misguidance and
unnecessarily lost sleep.

c) Ian has (once again) explained that for example the Rf-R depends on the
exact knowledge of the restraints and their individual weighting, which we
generally do not have. Caution is advised.

d) The answer which model is better - which is actually what you want to
know - becomes a question of model selection or hypothesis testing, which,
given the obscurity of R cannot be derived with some nice plug-in method. As
Ian said the models to be compared must also be based on the same and
identical data.

e) One measure available that is statistically at least defensible is the
log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
factor (there is the darn factor again, it’s a ratio)) and see where this
falls - and the answers are pretty soft and, probably because of that,
correspondingly realistic. This also makes - based on statistics alone -
deciding between different overall parameterizations difficult.

http://en.wikipedia.org/wiki/Bayes_factor

f) so having said that, what really remains is that the model that fits the
primary evidence (minimally biased electron density) best and is at the same
time physically meaningful, is the best mode

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ian Tickle
Indeed, see: http://scripts.iucr.org/cgi-bin/paper?a07175 .

The Rfree/Rwork ratio that I referred to does strictly use the
weighted ('Hamilton') R-factors, but because only the unweighted
values are given in the PDB we were forced to approximate (against our
better judgment!).

The problem of course is that all refinement software AFAIK writes the
unweighted Rwork & Rfree to the PDB header; there are no slots for the
weighted values, which does indeed make doing serious statistics on
the PDB entries difficult if not impossible!

The unweighted crystallographic R-factor was only ever intended as a
"rule of thumb", i.e. to give a rough idea of the relative quality of
related structures; I hardly think the crystallographers of yesteryear
ever imagined that we would be taking it so seriously now!

In particular IMO it should never be used for something as critical as
validation (either global or local), or for guiding refinement
strategy: use the likelihood instead.

Cheers

-- Ian

PS I've always known it as an 'R-factor', e.g. see paper referenced
above, but then during my crystallographic training I used extensively
software developed by both authors of the paper (i.e. Geoff Ford & the
late John Rollett) in Oxford (which eventually became the 'Crystals'
small-molecule package).  Maybe it's a transatlantic thing ...

Cheers

-- Ian

On Tue, Oct 26, 2010 at 7:28 PM, Ethan Merritt  wrote:
> On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) 
> wrote:
>> Hi Folks,
>>
>> Please allow me a few biased reflections/opinions on the numeRology of the
>> R-value (not R-factor, because it is neither a factor itself nor does it
>> factor in anything but ill-posed reviewer's critique. Historically the term
>> originated from small molecule crystallography, but it is only a
>> 'Residual-value')
>>
>> a) The R-value itself - based on the linear residuals and of apparent
>> intuitive meaning - is statistically peculiar to say the least. I could not
>> find it in any common statistics text. So doing proper statistics with R
>> becomes difficult.
>
> As WC Hamilton pointed out originally, two [properly weighted] R factors can
> be compared by taking their ratio.  Significance levels can then be evaluated
> using the standard F distribution.  A concise summary is given in chapter 9
> of Prince's book, which I highly recommend to all crystallographers.
>
> W C Hamilton "Significance tests on the crystallographic R factor"
> Acta Cryst. (1965). 18, 502-510
>
> Edward Prince "Mathematical Techniques in Crystallography and Materials
> Science". Springer-Verlag, 1982.
>
> It is true that we normally indulge in the sloppy habit of paying attention
> only to the unweighted R factor even though refinement programs report
> both the weighted and unweighted versions.  (shelx users excepted :-)
> But the weighted form is there also if you want to do statistical tests.
>
> You are of course correct that this remains a global test, and as such
> is of limited use in evaluating local properties of the model.
>
>        cheers,
>
>                Ethan
>
>
>
>
>> b) rules of thumb (as much as they conveniently obviate the need for
>> detailed explanations, satisfy student's desire for quick answers,  and
>> allow superficial review of manuscripts) become less valuable if they have a
>> case-dependent large variance, topped with an unknown parent distribution.
>> Combined with an odd statistic, that has great potential for misguidance and
>> unnecessarily lost sleep.
>>
>> c) Ian has (once again) explained that for example the Rf-R depends on the
>> exact knowledge of the restraints and their individual weighting, which we
>> generally do not have. Caution is advised.
>>
>> d) The answer which model is better - which is actually what you want to
>> know - becomes a question of model selection or hypothesis testing, which,
>> given the obscurity of R cannot be derived with some nice plug-in method. As
>> Ian said the models to be compared must also be based on the same and
>> identical data.
>>
>> e) One measure available that is statistically at least defensible is the
>> log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
>> factor (there is the darn factor again, it’s a ratio)) and see where this
>> falls - and the answers are pretty soft and, probably because of that,
>> correspondingly realistic. This also makes - based on statistics alone -
>> deciding between different overall parameterizations difficult.
>>
>> http://en.wikipedia.org/wiki/Bayes_factor
>>
>> f) so having said that, what really remains is that the model that fits the
>> primary evidence (minimally biased electron density) best and is at the same
>> time physically meaningful, is the best model, i. e., all plausibly
>> accountable electron density (and not more) is modeled. You can convince
>> yourself of this by taking the most interesting part of the model out (say a
>> ligand or a binding pocket) and look at the R-value

Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Bernhard Rupp (Hofkristallrat a.D.)
W C Hamilton "Significance tests on the crystallographic R factor"
Acta Cryst. (1965). 18, 502-510

Interestingly enough, I have used the Hamilton tests in Rietveld powder
refinements of small molecules/intermetallics before before. One problem
were partial occupancies vs split conformations in HT superconductors.
Alas, you cannot cheat there either - most of the time the results showed
that numerically the differences were not significant, and one again had to
resort to non-statistical plausibility arguments of references.

Has anyone done Hamiltons on different protein models/parameterizations and
can report?  I think for global parameterization changes like NCS,TLS, etc
that may in fact be interesting.

BR  


Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Gerard Bricogne
Dear all,

 Augustine, "Confessions", Book 11 Chap. XIV, has it:
 
 "If no one ask of me, I know; if I wish to explain to him who asks, I
know not."


 With best wishes,
 
  Gerard.

--
On Tue, Oct 26, 2010 at 01:30:11PM -0500, Phoebe Rice wrote:
> Another issue with these statistics is that the PDB insists on a single value 
> of "resolution" no matter how anisotropic the data.  Especially in the 
> outermost bins, Rmerge could be ridiculously high simply because the data 
> only exist in one out of 3 directions.
>Phoebe
> 
> =
> Phoebe A. Rice
> Dept. of Biochemistry & Molecular Biology
> The University of Chicago
> phone 773 834 1723
> http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
> http://www.rsc.org/shop/books/2008/9780854042722.asp
> 
> 
>  Original message 
> >Date: Tue, 26 Oct 2010 09:46:46 -0700
> >From: CCP4 bulletin board  (on behalf of "Bernhard 
> >Rupp (Hofkristallrat a.D.)" )
> >Subject: [ccp4bb] Against Method (R)  
> >To: CCP4BB@JISCMAIL.AC.UK
> >
> >Hi Folks,
> >
> >Please allow me a few biased reflections/opinions on the numeRology of the
> >R-value (not R-factor, because it is neither a factor itself nor does it
> >factor in anything but ill-posed reviewer's critique. Historically the term
> >originated from small molecule crystallography, but it is only a
> >'Residual-value')
> >
> >a) The R-value itself - based on the linear residuals and of apparent
> >intuitive meaning - is statistically peculiar to say the least. I could not
> >find it in any common statistics text. So doing proper statistics with R
> >becomes difficult.
> >
> >b) rules of thumb (as much as they conveniently obviate the need for
> >detailed explanations, satisfy student's desire for quick answers,  and
> >allow superficial review of manuscripts) become less valuable if they have a
> >case-dependent large variance, topped with an unknown parent distribution.
> >Combined with an odd statistic, that has great potential for misguidance and
> >unnecessarily lost sleep. 
> >
> >c) Ian has (once again) explained that for example the Rf-R depends on the
> >exact knowledge of the restraints and their individual weighting, which we
> >generally do not have. Caution is advised.
> >
> >d) The answer which model is better - which is actually what you want to
> >know - becomes a question of model selection or hypothesis testing, which,
> >given the obscurity of R cannot be derived with some nice plug-in method. As
> >Ian said the models to be compared must also be based on the same and
> >identical data.  
> >
> >e) One measure available that is statistically at least defensible is the
> >log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
> >factor (there is the darn factor again, it’s a ratio)) and see where this
> >falls - and the answers are pretty soft and, probably because of that,
> >correspondingly realistic. This also makes - based on statistics alone -
> >deciding between different overall parameterizations difficult. 
> >
> >http://en.wikipedia.org/wiki/Bayes_factor
> >
> >f) so having said that, what really remains is that the model that fits the
> >primary evidence (minimally biased electron density) best and is at the same
> >time physically meaningful, is the best model, i. e., all plausibly
> >accountable electron density (and not more) is modeled. You can convince
> >yourself of this by taking the most interesting part of the model out (say a
> >ligand or a binding pocket) and look at the R-values or do a model selection
> >test - the result will be indecisive.  Poof goes the global rule of thumb.
> >
> >g) in other words: global measures in general are entirely inadequate to
> >judge local model quality (noted many times over already by Jones, Kleywegt,
> >others, in the dark ages of crystallography when poorly restrained
> >crystallographers used to passionately whack each other over the head with
> >unfree R-values).   
> >
> >Best, BR
> >-
> >Bernhard Rupp, Hofkristallrat a.D.
> >001 (925) 209-7429
> >+43 (676) 571-0536
> >b...@ruppweb.org
> >hofkristall...@gmail.com
> >http://www.ruppweb.org/
> >--
> >Und wieder ein chillout-mix aus der Hofkristall-lounge
> >--

-- 

 ===
 * *
 * Gerard Bricogne g...@globalphasing.com  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Ethan Merritt
On Tuesday, October 26, 2010 09:46:46 am Bernhard Rupp (Hofkristallrat a.D.) 
wrote:
> Hi Folks,
> 
> Please allow me a few biased reflections/opinions on the numeRology of the
> R-value (not R-factor, because it is neither a factor itself nor does it
> factor in anything but ill-posed reviewer's critique. Historically the term
> originated from small molecule crystallography, but it is only a
> 'Residual-value')
> 
> a) The R-value itself - based on the linear residuals and of apparent
> intuitive meaning - is statistically peculiar to say the least. I could not
> find it in any common statistics text. So doing proper statistics with R
> becomes difficult.

As WC Hamilton pointed out originally, two [properly weighted] R factors can
be compared by taking their ratio.  Significance levels can then be evaluated
using the standard F distribution.  A concise summary is given in chapter 9
of Prince's book, which I highly recommend to all crystallographers.

W C Hamilton "Significance tests on the crystallographic R factor"
Acta Cryst. (1965). 18, 502-510

Edward Prince "Mathematical Techniques in Crystallography and Materials
Science". Springer-Verlag, 1982.

It is true that we normally indulge in the sloppy habit of paying attention
only to the unweighted R factor even though refinement programs report
both the weighted and unweighted versions.  (shelx users excepted :-)
But the weighted form is there also if you want to do statistical tests.

You are of course correct that this remains a global test, and as such
is of limited use in evaluating local properties of the model.

cheers,

Ethan




> b) rules of thumb (as much as they conveniently obviate the need for
> detailed explanations, satisfy student's desire for quick answers,  and
> allow superficial review of manuscripts) become less valuable if they have a
> case-dependent large variance, topped with an unknown parent distribution.
> Combined with an odd statistic, that has great potential for misguidance and
> unnecessarily lost sleep. 
> 
> c) Ian has (once again) explained that for example the Rf-R depends on the
> exact knowledge of the restraints and their individual weighting, which we
> generally do not have. Caution is advised.
> 
> d) The answer which model is better - which is actually what you want to
> know - becomes a question of model selection or hypothesis testing, which,
> given the obscurity of R cannot be derived with some nice plug-in method. As
> Ian said the models to be compared must also be based on the same and
> identical data.  
> 
> e) One measure available that is statistically at least defensible is the
> log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
> factor (there is the darn factor again, it’s a ratio)) and see where this
> falls - and the answers are pretty soft and, probably because of that,
> correspondingly realistic. This also makes - based on statistics alone -
> deciding between different overall parameterizations difficult. 
> 
> http://en.wikipedia.org/wiki/Bayes_factor
> 
> f) so having said that, what really remains is that the model that fits the
> primary evidence (minimally biased electron density) best and is at the same
> time physically meaningful, is the best model, i. e., all plausibly
> accountable electron density (and not more) is modeled. You can convince
> yourself of this by taking the most interesting part of the model out (say a
> ligand or a binding pocket) and look at the R-values or do a model selection
> test - the result will be indecisive.  Poof goes the global rule of thumb.
> 
> g) in other words: global measures in general are entirely inadequate to
> judge local model quality (noted many times over already by Jones, Kleywegt,
> others, in the dark ages of crystallography when poorly restrained
> crystallographers used to passionately whack each other over the head with
> unfree R-values).   
> 
> Best, BR
> -
> Bernhard Rupp, Hofkristallrat a.D.
> 001 (925) 209-7429
> +43 (676) 571-0536
> b...@ruppweb.org
> hofkristall...@gmail.com
> http://www.ruppweb.org/
> --
> Und wieder ein chillout-mix aus der Hofkristall-lounge
> --
> 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] Against Method (R)

2010-10-26 Thread Phoebe Rice
Another issue with these statistics is that the PDB insists on a single value 
of "resolution" no matter how anisotropic the data.  Especially in the 
outermost bins, Rmerge could be ridiculously high simply because the data only 
exist in one out of 3 directions.
   Phoebe

=
Phoebe A. Rice
Dept. of Biochemistry & Molecular Biology
The University of Chicago
phone 773 834 1723
http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123
http://www.rsc.org/shop/books/2008/9780854042722.asp


 Original message 
>Date: Tue, 26 Oct 2010 09:46:46 -0700
>From: CCP4 bulletin board  (on behalf of "Bernhard Rupp 
>(Hofkristallrat a.D.)" )
>Subject: [ccp4bb] Against Method (R)  
>To: CCP4BB@JISCMAIL.AC.UK
>
>Hi Folks,
>
>Please allow me a few biased reflections/opinions on the numeRology of the
>R-value (not R-factor, because it is neither a factor itself nor does it
>factor in anything but ill-posed reviewer's critique. Historically the term
>originated from small molecule crystallography, but it is only a
>'Residual-value')
>
>a) The R-value itself - based on the linear residuals and of apparent
>intuitive meaning - is statistically peculiar to say the least. I could not
>find it in any common statistics text. So doing proper statistics with R
>becomes difficult.
>
>b) rules of thumb (as much as they conveniently obviate the need for
>detailed explanations, satisfy student's desire for quick answers,  and
>allow superficial review of manuscripts) become less valuable if they have a
>case-dependent large variance, topped with an unknown parent distribution.
>Combined with an odd statistic, that has great potential for misguidance and
>unnecessarily lost sleep. 
>
>c) Ian has (once again) explained that for example the Rf-R depends on the
>exact knowledge of the restraints and their individual weighting, which we
>generally do not have. Caution is advised.
>
>d) The answer which model is better - which is actually what you want to
>know - becomes a question of model selection or hypothesis testing, which,
>given the obscurity of R cannot be derived with some nice plug-in method. As
>Ian said the models to be compared must also be based on the same and
>identical data.  
>
>e) One measure available that is statistically at least defensible is the
>log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
>factor (there is the darn factor again, it’s a ratio)) and see where this
>falls - and the answers are pretty soft and, probably because of that,
>correspondingly realistic. This also makes - based on statistics alone -
>deciding between different overall parameterizations difficult. 
>
>http://en.wikipedia.org/wiki/Bayes_factor
>
>f) so having said that, what really remains is that the model that fits the
>primary evidence (minimally biased electron density) best and is at the same
>time physically meaningful, is the best model, i. e., all plausibly
>accountable electron density (and not more) is modeled. You can convince
>yourself of this by taking the most interesting part of the model out (say a
>ligand or a binding pocket) and look at the R-values or do a model selection
>test - the result will be indecisive.  Poof goes the global rule of thumb.
>
>g) in other words: global measures in general are entirely inadequate to
>judge local model quality (noted many times over already by Jones, Kleywegt,
>others, in the dark ages of crystallography when poorly restrained
>crystallographers used to passionately whack each other over the head with
>unfree R-values).   
>
>Best, BR
>-
>Bernhard Rupp, Hofkristallrat a.D.
>001 (925) 209-7429
>+43 (676) 571-0536
>b...@ruppweb.org
>hofkristall...@gmail.com
>http://www.ruppweb.org/
>--
>Und wieder ein chillout-mix aus der Hofkristall-lounge
>--


[ccp4bb] Against Method (R)

2010-10-26 Thread Bernhard Rupp (Hofkristallrat a.D.)
Hi Folks,

Please allow me a few biased reflections/opinions on the numeRology of the
R-value (not R-factor, because it is neither a factor itself nor does it
factor in anything but ill-posed reviewer's critique. Historically the term
originated from small molecule crystallography, but it is only a
'Residual-value')

a) The R-value itself - based on the linear residuals and of apparent
intuitive meaning - is statistically peculiar to say the least. I could not
find it in any common statistics text. So doing proper statistics with R
becomes difficult.

b) rules of thumb (as much as they conveniently obviate the need for
detailed explanations, satisfy student's desire for quick answers,  and
allow superficial review of manuscripts) become less valuable if they have a
case-dependent large variance, topped with an unknown parent distribution.
Combined with an odd statistic, that has great potential for misguidance and
unnecessarily lost sleep. 

c) Ian has (once again) explained that for example the Rf-R depends on the
exact knowledge of the restraints and their individual weighting, which we
generally do not have. Caution is advised.

d) The answer which model is better - which is actually what you want to
know - becomes a question of model selection or hypothesis testing, which,
given the obscurity of R cannot be derived with some nice plug-in method. As
Ian said the models to be compared must also be based on the same and
identical data.  

e) One measure available that is statistically at least defensible is the
log-likelihood. So what you can do is form a log-likelihood ratio (or Bayes
factor (there is the darn factor again, it’s a ratio)) and see where this
falls - and the answers are pretty soft and, probably because of that,
correspondingly realistic. This also makes - based on statistics alone -
deciding between different overall parameterizations difficult. 

http://en.wikipedia.org/wiki/Bayes_factor

f) so having said that, what really remains is that the model that fits the
primary evidence (minimally biased electron density) best and is at the same
time physically meaningful, is the best model, i. e., all plausibly
accountable electron density (and not more) is modeled. You can convince
yourself of this by taking the most interesting part of the model out (say a
ligand or a binding pocket) and look at the R-values or do a model selection
test - the result will be indecisive.  Poof goes the global rule of thumb.

g) in other words: global measures in general are entirely inadequate to
judge local model quality (noted many times over already by Jones, Kleywegt,
others, in the dark ages of crystallography when poorly restrained
crystallographers used to passionately whack each other over the head with
unfree R-values).   

Best, BR
-
Bernhard Rupp, Hofkristallrat a.D.
001 (925) 209-7429
+43 (676) 571-0536
b...@ruppweb.org
hofkristall...@gmail.com
http://www.ruppweb.org/
--
Und wieder ein chillout-mix aus der Hofkristall-lounge
--