Let's say you have two isomorphous crystals of two different protein-ligand complexes. Same protein different ligand, same xtal form. Conventionally you'd keep the same free set reflections (hkl values) between the two datasets to reduce biasing. However if the first model had been refined against all reflections there is no longer a free set for that model, thus all hkl's have seen the atoms during refinement, and so your R-free in the second complex is initially biased to the model from the first complex. [*]

The tendency is to do less refinement in these sort of isomorphous cases than in molecular replacement solutions, because the structural changes are usually far less (it is isomorphous after all) so there's a risk that the R-free will not be allowed to fully float free of that initial bias. That makes your R-free look better than it actually is.

This is rather strongly analogous to using different free sets in the two datasets.

However I'm not sure that this is as big of a deal as it is being made to sound. It can be dealt with straightforwardly. However refining against all the data weakens the use of R-free as a validation tool for that particular model so the people that like to judge structures based on a single number (i.e. R-free) are going to be quite put out.

It's also the case that the best model probably *is* the one based on a careful last round of refinement against all data, as long as nothing much changes. That would need to be quantified in some way(s).

Phil Jeffrey
Princeton

[* Your R-free is also initially model-biased in cases where the data are significant non-isomorphous or you're using two different xtal forms, to varying extents]



I still don't understand how a structure model refined with all data
would negatively affect the determination and/or refinement of an
isomorphous structure using a different data set (even without doing SA
first).

Quyen

On Oct 14, 2011, at 4:35 PM, Nat Echols wrote:

On Fri, Oct 14, 2011 at 1:20 PM, Quyen Hoang <qqho...@gmail.com
<mailto:qqho...@gmail.com>> wrote:

    Sorry, I don't quite understand your reasoning for how the
    structure is rendered useless if one refined it with all data.


"Useless" was too strong a word (it's Friday, sorry). I guess
simulated annealing can address the model-bias issue, but I'm not
totally convinced that this solves the problem. And not every
crystallographer will run SA every time he/she solves an isomorphous
structure, so there's a real danger of misleading future users of the
PDB file. The reported R-free, of course, is still meaningless in the
context of the deposited model.

    Would your argument also apply to all the structures that were
    refined before R-free existed?


Technically, yes - but how many proteins are there whose only
representatives in the PDB were refined this way? I suspect very few;
in most cases, a more recent model should be available.

-Nat

Reply via email to