subject:"\[ccp4bb\] The importance of USING our validation tools"

Re: [ccp4bb] The importance of USING our validation tools

2007-08-31 Thread Eleanor Dodson

ZO has a good point - it is a pain trying to get decent simulated 
material - maybe there is an employment opportunity here?


Eleanor

Zbyszek Otwinowski wrote:

James Holton wrote:

How MUCH do you want to bet?

;)



Any amount, as long as we are taking about real diffraction images
corresponding to the deposited file with observed structure factors.
I doubt that simulated diffraction images will be shown, because they
are easy to be recognized as such. Independently, I value the possibility
of data simulation in methods development and for teaching purposes.


Zbyszek Otwinowski
UT Southwestern Medical Center   
5323 Harry Hines Blvd., Dallas, TX 75390-8816

(214) 645 6385 (phone) (214) 645 6353 (fax)
[EMAIL PROTECTED]

Re: [ccp4bb] The importance of USING our validation tools

2007-08-31 Thread Jon Wright


Think this bounced last time I tried to mail it in, a simulator exists at:

http://fable.sourceforge.net/index.php/Farfield_Simulation

Jon

Eleanor Dodson wrote:
ZO has a good point - it is a pain trying to get decent simulated 
material - maybe there is an employment opportunity here?


Eleanor

Zbyszek Otwinowski wrote:

James Holton wrote:

How MUCH do you want to bet?

;)



Any amount, as long as we are taking about real diffraction images
corresponding to the deposited file with observed structure factors.
I doubt that simulated diffraction images will be shown, because they
are easy to be recognized as such. Independently, I value the possibility
of data simulation in methods development and for teaching purposes.


Zbyszek Otwinowski
UT Southwestern Medical Center   5323 Harry Hines Blvd., Dallas, TX 
75390-8816

(214) 645 6385 (phone) (214) 645 6353 (fax)
[EMAIL PROTECTED]

Re: [ccp4bb] The importance of USING our validation tools

2007-08-30 Thread David Briggs

I'm going to agree with Raji's observations, and fan the flames of his point
a little.

I count myself as lucky that I have had access to certain people during my
crystallographic training who had a good understanding of the theory  behind
crystallography (hopefully I have exploited this luck sufficiently). Despite
their tutelage, I will hold my hands up and admit that certain technical
discussions on the bb leave me a little confused occasionally...

However, I have seen what Raji described going on around me, and it is
pretty prevalent. Structures are pushed through sometimes, without the PhD
student really knowing quite what's happened. I cut my teeth on a few
structures that didn't have the sort of pressure on them that others had,
and this allowed me to get to grips with what was going on. I also had a few
real pigs of projects - you tend to learn a lot more when stuff goes wrong -
if you bang your data through program X and get textbook maps  stats, you
haven't really learnt anything - if you've struggled with Molecular
replacement, your SeMet won't crystallise and your heavy atoms won't stick,
then you tend to learn how to make Phaser run the last half yard etc etc -
and that half yard often come about from thinking about your problems in the
right way - having an old-school crystallographer to bang ideas off can be
invaluable at this point. Learning the theory is not always encouraged, and,
given that to do it properly takes some time and application, it is often
towards the bottom of the priority list. It would take a ballsy student to
say to their boss, no I can't do experiments X,Y  Z until I have read 
understood this paper on Maximum Likelihood!

However, in a system in which there exists a fair amount of pressure and
competition (exacerbated in the US system, I think), the temptation to hand
off ALL data to a structure-solver can be great. However, if this practice
continues, as Raji suggests, there will be a lack of properly trained
crystallographers - and mistakes will be more likely to occur.

The suggestion of explicitly stating X crystallised, Y collected data, Z
phased and refined in a paper is good one, and some journals (eg Nature)
like an author contributions section. However, if a group is willing to
'overlook' problems in their data as recently seen, maybe they cannot be
trusted to make these statements accurately.

I think, that the only water-tight way of preventing such mistakes again is
to have every paper that contains a structure to be reviewed by at least one
properly-trained crystallographer, and to have the data (pdbs  SFs) made
available to them.

just my lunchtime ramble...

Dave

On 29/08/2007, Raji Edayathumangalam [EMAIL PROTECTED] wrote:

 I would like to mention some other issues now that Ajees et al. has
 stirred all sorts of
 discussions. I hope I haven't opened Pandora's box.

 From what I have learned around here, very often, there seems to be little
 time allowed or allocated
 to actually learn--a bit beyond the surface--some of the crystallography
 or what the
 crystallographic software is doing during the structure solution process.

 A good deal of the postdocs and students here are under incredible
 pressure to get the structure
 DONE asap. For some of them, it is their first time solving a crystal
 structure. Yes, the same
 heapful of reasons: because it's hot, competitive, grant deadline, PI
 tenure pressure etc. etc.
 Learning takes the backseat and this is total rubbish and very scary, in
 my biased personal opinion.
 Although I think it is the person's responsibility to take the time and
 initiative to learn, I also
 see that the pressure often is insurmountable. Often, the PI and/or
 assigned structure solver in
 the lab pretty much takes charge at some early stage of structure
 determination and solves the
 structure with much lesser contribution from the scientist in training
 (student/postdoc). All that
 slog to clone, purify, crystallize, optimize diffraction only to realize
 someone else will come
 along, process the data and finish up the structure for you. Such
 'training' (or lack thereof) is
 a recipe for generating 'bad' structures in future and part of the reason
 for this endless thread.

 I think it is NOT as common for someone else to, say, run all the Western
 blots for you, maintain
 your tissue cell lines for you, do your protein preps for you. Is it
 because it is much easier to
 upload someone else's crystallographic data on one's machine and solve the
 structure (since this
 does not demand the same kind of physical labor and effort and is also a
 lot of fun) that this
 happens? I understand when the PI or structure solver does the above as
 part of a teamwork and
 allows for the person in question to learn. But often, I see the person is
 somewhat left overwhelmed
 and clueless in the end.

 I bring this issue to the forum since I do not know if this phenomenon is
 ubiquitous. If this
 practice is a rampant weed, can we as a

Re: [ccp4bb] The importance of USING our validation tools

2007-08-28 Thread Kevin Cowtan


Wow! Those are two pretty amazing structures.

For those of you who haven't had a look, the ordered molecules are in 
layers with *huge* gaps in between, much greater than in 2hr0.


And yet both of these structures were solved with experimental phasing 
(SIRAS) unlike 2hr0, and the data is to higher resolution.


Mark J. van Raaij wrote:
With regards to our structures 1H6W (1.9A) and 1OCY (1.5A), rather than 
faith, I think the structure is held together by a real mechanism, which 
however I can't explain. Like in the structure Axel Brunger mentioned, 
there is appreciable diffuse scatter, which imo deserves to be analysed 
by someone expert in the matter (to whom, or anyone else, I would gladly 
supply the images which I should still have on a tape or CD in the 
cupboard...). For low-res version of one image see 
http://web.usc.es/~vanraaij/diff45kd.png
and 
http://web.usc.es/~vanraaij/diff45kdzoom.png

two possibilities I have been thinking about:

Re: [ccp4bb] The importance of USING our validation tools

2007-08-27 Thread Mark J. van Raaij

In general, I think we should be careful about too strong statements,  
while in general structures with high solvent diffract to low-res,  
there are a few examples where they diffract to high res. Obviously,  
high solvent content means fewer crystal contacts, but if these few  
are very stable?
Similarly, there are probably a few structures with a high percentage  
of Ramachandran outliers which are real and similarly for all other  
structural quality indicators. However, combinations of various of  
these probably do not exist and in any case, every unusual feature  
like this should be described and an attempt made to explain/analyse  
it, which in the case of the Nature paper that started this thread  
was apparently not done, apart from the rebuttal later (and perhaps  
in unpublished replies to the referees?).


With regards to our structures 1H6W (1.9A) and 1OCY (1.5A), rather  
than faith, I think the structure is held together by a real  
mechanism, which however I can't explain. Like in the structure Axel  
Brunger mentioned, there is appreciable diffuse scatter, which imo  
deserves to be analysed by someone expert in the matter (to whom, or  
anyone else, I would gladly supply the images which I should still  
have on a tape or CD in the cupboard...). For low-res version of one  
image see

http://web.usc.es/~vanraaij/diff45kd.png
and
http://web.usc.es/~vanraaij/diff45kdzoom.png
two possibilities I have been thinking about:
1. only a few of the tails are ordered, rather like a stack of  
identical tables in which four legs hold the table surfaces stably  
together, but the few ordered tails/legs do not contribute much to  
the diffraction. This raises the question why some tails should be  
stiff and others not; perhaps traces of a metal or other small  
molecule stabilise some tails (although crystal optimisation trials  
did not show up such a molecule)?
2. three-fold disorder, either individually or in microdomains too  
small to have been resolved by the beam used. For this I have been  
told to expect better density than observed, but maybe this is not true.
we did try integrating in lower space groups P3, P2 instead of P321  
with no improvement of the density, we tried a RT dataset to see if  
freezing caused the disorder and we tried improving the phases by MAD  
on the mercury derivative, but with no improvement in the density for  
the tail.


Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/


On 24 Aug 2007, at 03:01, Petr Leiman wrote:

- Original Message - From: Jenny Martin  
[EMAIL PROTECTED]

To: CCP4BB@JISCMAIL.AC.UK
Sent: Thursday, August 23, 2007 5:46 PM
Subject: Re: [ccp4bb] The importance of USING our validation tools

My question is, how could crystals with 80% or more solvent  
diffract  so well? The best of the three is 1.9A resolution with I/ 
sigI 48 (top  shell 2.5). My experience is that such crystals  
diffract very weakly.


You must be thinking about Mark van Raaij's T4 short tail fibre  
structures. Yes, the disorder in those crystals is extreme. There  
are ~100-150 A thick disordered layers between the ~200 A thick  
layers of ordered structure. The diffraction pattern does not show  
any anomalies (as far as I can remember from 6 years ago). The  
spots are round, there are virtually no spots not covered by  
predictions, and the crystals diffract to 1.5A resolution. The  
disordered layers are perpendicular to the threefold axis of the  
crystal. The molecule is a trimer and sits on the threefold axis.  
It appears that the ordered layers somehow know how to position  
themselves across the disordered layers.  I agree here with Michael  
Rossmann that in these crystals the ordered layers are held  
together by faith.
Mark integrated the dataset in lower space groups, but the  
disordered stuff was not visible anyway. He will probably add more  
to the discussion.


Petr




Any thoughts?

Cheers,
Jenny

Re: [ccp4bb] The importance of USING our validation tools

2007-08-23 Thread Edwin Pozharski


Mischa,

I don't think that the field of nanotechnology crumbled when allegations 
against  Jan Hendrik Schon  (21 papers withdrawn, 15 in Science/Nature) 
turned out to be true.  I don't think that nobody trusts biologists 
anymore because of Eric Poehlman (17 falsified grants, 10 papers with 
fabricated data, 12 month in prison).  We are still excited to hear 
about stem cell research despite of what Hwang Woo-suk did or didn't 
do.  What recent events demonstrate is that in macromolecular 
crystallography (and in science in general) mistakes, deliberate or not, 
will be discovered. 
Ed.


Mischa Machius wrote:
Due to these recent, highly publicized irregularities and ample 
(snide) remarks I hear about them from non-crystallographers, I am 
wondering if the trust in macromolecular crystallography is beginning 
to erode. It is often very difficult even for experts to distinguish 
fake or wishful thinking from reality. Non-crystallographers will have 
no chance at all and will consequently not rely on our results as much 
as we are convinced they could and should. If that is indeed the case, 
something needs to be done, and rather sooner than later.  Best - MM


 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353


--
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] The importance of USING our validation tools

2007-08-23 Thread Jordi Benach

Dear colleagues,

1) I think Ajees et al. should make available the raw diffraction images of
the structure in paper that has caused so much literary commotion, unless
they haven't already done so. Perhaps simply put them in an open ftp server?


As I imagine, unless I have missed something, these diffraction images were
obtained with grant money, so they should be available to the community.
Isn't it?

This would allow other scientists to evaluate them as much as they wanted
and publish many more papers about the validity or falsehood of the
conclusions drawn in the original and (now) infamous Ajees et al. paper.
That's how Science -in my opinion- ought to be.

2) I agree that depositing raw images in the PDB or elsewhere would be a
great thing for everybody - I usually and happily deposit all the structure
factors that I've used to obtain and refine a structure -. 

However, raw images are becoming larger and larger with the newer and
fancier detectors and this trend might not stabilize in quite a while.
Although disk space is becoming as well cheaper as time goes by, I think the
ratio between these two factors is still unpractical for huge storage
purposes. Unless a major development in data storage is accomplished.

As an anecdote: during a trip to a synchrotron in the American Midwest, a
single dataset (1 degree x 360) was something like 27GB of raw images!!! We
managed to collect 1.5 TB of data in about 2 days (having to run -of course
always in a hurry- to the nearest computer store to get a few more external
hard-drives to backup and take with us all our data).

Albeit of being a great option for many of us, I insist. I cannot imagine
the burden that storing so much data would be for the PDB or any public
database. Not only for taking care of the amount of disk space or storage
support required, but as people have mentioned here taking care of them
(curating them, since disks do crash, as we know, and optical media get
irremediably scratched) would be a tremendous and likely expensive endeavor.

3) Perhaps, we should responsibly store the data ourselves, nicely stored in
media that should allow us to retrieve it after many years (quite a task
by itself already; forget the clay tablets though). As probably many of us
have done for quite some time.

And when asked, send the data to anyone who is interested. But... don't they
have already problems accessing the tapes from the first lunar landing?

4) In any case, we should not forget the subject of storing and
accessibility of the crystallographic raw images in a public database.
Perhaps more journals should accept open letters about this subject, which
is important as well as complicated, and create a much larger discussion
than this one.

All the best,

Jordi

__

Jordi Benach, PhD
MX Beamline Scientist
ALBA Synchrotron Light Facility
Edifici Ciències. Mòdul C-3 Central
Campus Universitat Autònoma de Barcelona
08193 Bellaterra, Barcelona, SPAIN
Phone: +34 93 592 4333
FAX:   +34 93 592 4302
E-mail: [EMAIL PROTECTED]
__

Re: [ccp4bb] The importance of USING our validation tools

2007-08-23 Thread Jenny Martin


I've been reading the contributions on this topic with much interest.
It's been very timely in that I've been giving 3rd year u/g lectures  
on protein X-ray structures and their validation over the past week.
As part of the preparation for the lectures, I searched the PDB for  
structures with high solvent content.
To my surprise, I found 376 crystal structures with solvent content  
75% (about 1% of all crystal structures) and 120 structures with  
solvent content  80% (about 0.3% of all crystal structures)
However, there were only 3 other structures that (like 2HR0) had 80%  
AND Rcryst Rfree less than 20%. All three structures are solved to  
better than 3A Resolution.
One is from a weak data set from a virus crystal, the other two PDB  
files report very strong crystallographic data.
The Rmerge values are more typical than for 2HR0 and none of the  
three appear to have the geometry or crystal contact problems of 2HR0.


My question is, how could crystals with 80% or more solvent diffract  
so well? The best of the three is 1.9A resolution with I/sigI 48 (top  
shell 2.5). My experience is that such crystals diffract very weakly.
There are another 15 structures with solvent content 75-80% and  
Rcryst/Rfree  20%. I didn't check them in any detail, just to see  
that the structure was consistent with a high solvent content.


Any thoughts?

Cheers,
Jenny

Re: [ccp4bb] The importance of USING our validation tools

2007-08-23 Thread Dale Tronrud


   In the cases you list, it is clearly recognized that the fault lies
with the investigator and not the method.  In most of the cases where
serious problems have been identified in published models the authors
have stonewalled by saying that the method failed them.

   The methods of crystallography are so weak that we could not detect
(for years) that our program was swapping F+ and F-.

   The scattering of X-rays by bulk solvent is a contentious topic.

   We should have pointed out that the B factors of the peptide are
higher then those of the protein.

   It appears that the problems occurred because these authors were not
following established procedures in this field.  They are, as near as
I can tell, somehow immune from the consequences of their errors.
Usually the paper isn't even retracted, when the model is clearly
wrong.  They can dump blame on the technique and escape personal
responsibility.  This is what upsets so many of us.

   It would be so refreshing to read in one of these responses We
were under a great deal of pressure to get our results out before our
competitors and cut corners that we shouldn't have, and that choice
resulted in our failure to detect the obvious errors in our model.

   If we did see papers retracted, if we did see nonrenewal of grants,
if we did see people get fired, if we did see prison time (when the
line between carelessness and fraud is crossed), then we could be
comforted that there is practical incentive to perform quality work.


Dale Tronrud


Edwin Pozharski wrote:

Mischa,

I don't think that the field of nanotechnology crumbled when allegations 
against  Jan Hendrik Schon  (21 papers withdrawn, 15 in Science/Nature) 
turned out to be true.  I don't think that nobody trusts biologists 
anymore because of Eric Poehlman (17 falsified grants, 10 papers with 
fabricated data, 12 month in prison).  We are still excited to hear 
about stem cell research despite of what Hwang Woo-suk did or didn't 
do.  What recent events demonstrate is that in macromolecular 
crystallography (and in science in general) mistakes, deliberate or not, 
will be discovered. Ed.


Mischa Machius wrote:
Due to these recent, highly publicized irregularities and ample 
(snide) remarks I hear about them from non-crystallographers, I am 
wondering if the trust in macromolecular crystallography is beginning 
to erode. It is often very difficult even for experts to distinguish 
fake or wishful thinking from reality. Non-crystallographers will have 
no chance at all and will consequently not rely on our results as much 
as we are convinced they could and should. If that is indeed the case, 
something needs to be done, and rather sooner than later.  Best - MM


 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Re: [ccp4bb] The importance of USING our validation tools

2007-08-23 Thread Petr Leiman

- Original Message - 
From: Jenny Martin [EMAIL PROTECTED]

To: CCP4BB@JISCMAIL.AC.UK
Sent: Thursday, August 23, 2007 5:46 PM
Subject: Re: [ccp4bb] The importance of USING our validation tools

My question is, how could crystals with 80% or more solvent diffract  so 
well? The best of the three is 1.9A resolution with I/sigI 48 (top  shell 
2.5). My experience is that such crystals diffract very weakly.

You must be thinking about Mark van Raaij's T4 short tail fibre structures. 
Yes, the disorder in those crystals is extreme. There are ~100-150 A thick 
disordered layers between the ~200 A thick layers of ordered structure. The 
diffraction pattern does not show any anomalies (as far as I can remember 
from 6 years ago). The spots are round, there are virtually no spots not 
covered by predictions, and the crystals diffract to 1.5A resolution. The 
disordered layers are perpendicular to the threefold axis of the crystal. 
The molecule is a trimer and sits on the threefold axis. It appears that the 
ordered layers somehow know how to position themselves across the disordered 
layers.  I agree here with Michael Rossmann that in these crystals the 
ordered layers are held together by faith.
Mark integrated the dataset in lower space groups, but the disordered stuff 
was not visible anyway. He will probably add more to the discussion.

Petr

Any thoughts?

Cheers,
Jenny

Re: [ccp4bb] The importance of USING our validation tools

2007-08-23 Thread Axel T. Brunger


Another example of a structure with intervening
layers of weak electron density at 1.75 A
resolution is  Pb2+ bound calmodulin that Mark Wilson
solved in my laboratory: M.A. Wilson and A.T. Brunger, /
Acta Cryst./D59, 1782-1792 (2003), PDB ID 1NOY. 



The intervening layers are not entirely disordered since
PB2+ positions show up in difference maps in these
layers, so it could indicate motion around these positions rather than
complete disorder.   However, apart from the Pb2+ positions,
the electron density in these layers is weak and un-interpretable.


Apart from the weak layers, the structure behaves completely
normal, i.e., we observe the expected bulk solvent contribution
at low resolution, and the B-factor distributions are as expected. 



Axel





Petr Leiman wrote:
- Original Message - From: Jenny Martin 
[EMAIL PROTECTED]

To: CCP4BB@JISCMAIL.AC.UK
Sent: Thursday, August 23, 2007 5:46 PM
Subject: Re: [ccp4bb] The importance of USING our validation tools

My question is, how could crystals with 80% or more solvent diffract  
so well? The best of the three is 1.9A resolution with I/sigI 48 
(top  shell 2.5). My experience is that such crystals diffract very 
weakly.


You must be thinking about Mark van Raaij's T4 short tail fibre 
structures. Yes, the disorder in those crystals is extreme. There are 
~100-150 A thick disordered layers between the ~200 A thick layers of 
ordered structure. The diffraction pattern does not show any anomalies 
(as far as I can remember from 6 years ago). The spots are round, 
there are virtually no spots not covered by predictions, and the 
crystals diffract to 1.5A resolution. The disordered layers are 
perpendicular to the threefold axis of the crystal. The molecule is a 
trimer and sits on the threefold axis. It appears that the ordered 
layers somehow know how to position themselves across the disordered 
layers.  I agree here with Michael Rossmann that in these crystals the 
ordered layers are held together by faith.
Mark integrated the dataset in lower space groups, but the disordered 
stuff was not visible anyway. He will probably add more to the 
discussion.


Petr




Any thoughts?

Cheers,
Jenny 




--
Axel T. Brunger
Investigator,  Howard Hughes Medical Institute
Professor of Molecular and Cellular Physiology
Stanford University

Web:http://atb.slac.stanford.edu
Email:  [EMAIL PROTECTED]  
Phone:  +1 650-736-1031

Fax:+1 650-745-1463

Re: [ccp4bb] The importance of USING our validation tools

2007-08-20 Thread George M. Sheldrick

Dear Alex,

Of course a simplified one page summary would not be the last word, but I 
think that it would be a big step in the right direction. For example a 
value of Rfree that is 'too good' because the reflection set for it has 
been chosen wrongly can be detected statistically (Tickle et al., Acta 
D56 (2000) 443-450). And it would be not be too difficult to distinguish 
between three possible causes of incomplete data: (a) there is a dead 
cone of data because it was a single scan of a low symmetry crystal, 
(b) a large number of 'overloads' were rejected (they would all have 
fairly low resolution and high Fc values) or (c) the missing reflections 
are fairly randomly distributed because they have been removed by hand to 
improve the R-values. I think that there is a very good case for making 
this Rinformation available to referees in an easily comprehensible form.

George 

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Sun, 19 Aug 2007, Alexander Aleshin wrote:

 I do not think the small molecule approach proposed by George Sheldrick
 is sufficient for validation of protein structures, as misrepresentation
 of experimental statistics/resolution is hard to detect with it, and
 these factors appear to play crucial role in defining the fate of many
 hot structures. 
 
 The bad statistics hurts publication more than mistakes in a model, and
 improving the experiment is often too hard. I know my structure is
 right. Why should I spend another year growing better crystals only to
 make the statistics look right? - sounds as a strong argument for a
 desperate researcher. Making up an artificial data set overkills the
 task. There are easier and less amoral ways such as rejection of
 outliers and incorrect assignment of the Rfree test set. Ironically, an
 undereducated crystallographer may not recognize wrongdoing in such data
 treatment, which makes it even more likely to occur. 
 
 Do I sound paranoid? And please do not suggest that I have shared
 personal experiences.
 
 
 Alex Aleshin
 
 
 On Sat, 18 Aug 2007, George M. Sheldrick wrote:
 
  There are good reasons for preserving frames, but most of all for the 
  crystals that appeared to diffract but did not lead to a successful 
  structure solution, publication, and PDB deposition. Maybe in the
 future 
  there will be improved data processing software (for example to
 integrate 
  non-merohedral twins) that will enable good structures to be obtained
 from 
  such data. At the moment most such data is thrown away. However,
 forcing 
  everyone to deposit their frames each time they deposit a structure
 with 
  the PDB would be a thorough nuisance and major logistic hassle.
  
  It is also a complete illusion to believe that the reviewers for
 Nature 
  etc. would process or even look at frames, even if they could download
 
  them with the manuscript. 
  
  For small molecules, many journals require an 'ORTEP plot' to be
 submitted 
  with the paper. As older readers who have experienced Dick Harlow's
 'ORTEP 
  of the year' competition at ACA Meetings will remember, even a viewer 
  with little experience of small-molecule crystallography can see from
 the 
  ORTEP plot within seconds if something is seriously wrong, and many 
  non-crystallographic referees for e.g. the journal Inorganic Chemistry
 
  can even make a good guess as to what is wrong (e.g wrong element
 assigned 
  to an atom). It would be nice if we could find something similar for 
  macromolecules that the author would have to submit with the paper.
 One 
  immediate bonus is that the authors would look at it carefully 
  themselves before submitting, which could lead to an improvement of
 the 
  quality of structures being submitted. My suggestion is that the wwPDB
 
  might provide say a one-page diagnostic summary when they allocate
 each 
  PDB ID that could be used for this purpose.
  
  A good first pass at this would be the output that the MolProbity
 server 
  http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It
 
  starts with a few lines of summary in which bad things are marked red 
  and the structure is assigned to a pecentile: a percentile of 6% means
 
  that 93% of the sturcture in the PDB with a similar resolution are 
  'better' and 5% are 'worse'. This summary can be understood with very 
  little crystallographic background and a similar summary can 
  of course be produced for NMR structures. The summary is followed by 
  diagnostics for each residue, normally if the summary looks good it 
  would not be necessary for the editor or referee to look at the rest.
  
  Although this server was intended to help us to improve our structures
 
  rather than detect manipulated or fabricated data, I asked it for a 
  report on 2HR0 to see what it would do (probably many other people
 were 
  trying to do exactly the

Re: [ccp4bb] The importance of USING our validation tools

2007-08-20 Thread Phil Evans

I worry a bit about some of this discussion, in that I wouldn't like  
the free-R-factor police to get too powerful. I imagine that many of  
us have struggled with datasets which are  sub-optimal for all sorts  
of reasons (all crystals are multiple/split/twinned; substantial  
disordered regions; low resolution, etc) - and it is not possible to  
get better data. I have certainly fought hard to get free-R below  
(the magic) 30%, when I know the structure is _essentially_ right,  
but the details are a little blurred in places, even when I have done  
the best I can. Anyway the important things are not the statistics,  
but the maps.


Does this make the structure unpublishable? No, provided that we  
remember a basic tenet of science, that the conclusions drawn should  
be supported by the evidence available. With limited data, the  
conclusions may be more limited, but still often illuminate the  
biology, which is the reason for solving the structure in the first  
place.


The evidence should be available to readers  referees, so deposition  
at least structure factors should be compulsory (why isn't it  
already?). Unmerged data or images would be nice, but I doubt that  
many people would use them (great for developers though)


Phil

On 20 Aug 2007, at 08:24, George M. Sheldrick wrote:


Dear Alex,

Of course a simplified one page summary would not be the last word,  
but I
think that it would be a big step in the right direction. For  
example a
value of Rfree that is 'too good' because the reflection set for it  
has

been chosen wrongly can be detected statistically (Tickle et al., Acta
D56 (2000) 443-450). And it would be not be too difficult to  
distinguish

between three possible causes of incomplete data: (a) there is a dead
cone of data because it was a single scan of a low symmetry crystal,
(b) a large number of 'overloads' were rejected (they would all have
fairly low resolution and high Fc values) or (c) the missing  
reflections
are fairly randomly distributed because they have been removed by  
hand to
improve the R-values. I think that there is a very good case for  
making
this Rinformation available to referees in an easily comprehensible  
form.


George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Sun, 19 Aug 2007, Alexander Aleshin wrote:

I do not think the small molecule approach proposed by George  
Sheldrick
is sufficient for validation of protein structures, as  
misrepresentation

of experimental statistics/resolution is hard to detect with it, and
these factors appear to play crucial role in defining the fate of  
many

hot structures.

The bad statistics hurts publication more than mistakes in a  
model, and

improving the experiment is often too hard. I know my structure is
right. Why should I spend another year growing better crystals  
only to

make the statistics look right? - sounds as a strong argument for a
desperate researcher. Making up an artificial data set overkills the
task. There are easier and less amoral ways such as rejection of
outliers and incorrect assignment of the Rfree test set.  
Ironically, an
undereducated crystallographer may not recognize wrongdoing in  
such data

treatment, which makes it even more likely to occur.

Do I sound paranoid? And please do not suggest that I have shared
personal experiences.


Alex Aleshin


On Sat, 18 Aug 2007, George M. Sheldrick wrote:

There are good reasons for preserving frames, but most of all for  
the

crystals that appeared to diffract but did not lead to a successful
structure solution, publication, and PDB deposition. Maybe in the

future

there will be improved data processing software (for example to

integrate
non-merohedral twins) that will enable good structures to be  
obtained

from

such data. At the moment most such data is thrown away. However,

forcing

everyone to deposit their frames each time they deposit a structure

with

the PDB would be a thorough nuisance and major logistic hassle.

It is also a complete illusion to believe that the reviewers for

Nature
etc. would process or even look at frames, even if they could  
download



them with the manuscript.

For small molecules, many journals require an 'ORTEP plot' to be

submitted

with the paper. As older readers who have experienced Dick Harlow's

'ORTEP
of the year' competition at ACA Meetings will remember, even a  
viewer
with little experience of small-molecule crystallography can see  
from

the

ORTEP plot within seconds if something is seriously wrong, and many
non-crystallographic referees for e.g. the journal Inorganic  
Chemistry



can even make a good guess as to what is wrong (e.g wrong element

assigned

to an atom). It would be nice if we could find something similar for
macromolecules that the author would have to submit with the paper.

One

immediate bonus is that the authors

Re: [ccp4bb] The importance of USING our validation tools

2007-08-19 Thread George M. Sheldrick

PS. A completely unimportant correction to my comment on the MolProbity 
output for 2HR0: every residue is indeed an outlier in at least one test, 
but in three cases it is only the CB-deviation test, not the other three 
tests that I mentioned.

George 

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Sat, 18 Aug 2007, George M. Sheldrick wrote:

 There are good reasons for preserving frames, but most of all for the 
 crystals that appeared to diffract but did not lead to a successful 
 structure solution, publication, and PDB deposition. Maybe in the future 
 there will be improved data processing software (for example to integrate 
 non-merohedral twins) that will enable good structures to be obtained from 
 such data. At the moment most such data is thrown away. However, forcing 
 everyone to deposit their frames each time they deposit a structure with 
 the PDB would be a thorough nuisance and major logistic hassle.
 
 It is also a complete illusion to believe that the reviewers for Nature 
 etc. would process or even look at frames, even if they could download 
 them with the manuscript. 
 
 For small molecules, many journals require an 'ORTEP plot' to be submitted 
 with the paper. As older readers who have experienced Dick Harlow's 'ORTEP 
 of the year' competition at ACA Meetings will remember, even a viewer 
 with little experience of small-molecule crystallography can see from the 
 ORTEP plot within seconds if something is seriously wrong, and many 
 non-crystallographic referees for e.g. the journal Inorganic Chemistry 
 can even make a good guess as to what is wrong (e.g wrong element assigned 
 to an atom). It would be nice if we could find something similar for 
 macromolecules that the author would have to submit with the paper. One 
 immediate bonus is that the authors would look at it carefully 
 themselves before submitting, which could lead to an improvement of the 
 quality of structures being submitted. My suggestion is that the wwPDB 
 might provide say a one-page diagnostic summary when they allocate each 
 PDB ID that could be used for this purpose.
 
 A good first pass at this would be the output that the MolProbity server 
 http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It 
 starts with a few lines of summary in which bad things are marked red 
 and the structure is assigned to a pecentile: a percentile of 6% means 
 that 93% of the sturcture in the PDB with a similar resolution are 
 'better' and 5% are 'worse'. This summary can be understood with very 
 little crystallographic background and a similar summary can 
 of course be produced for NMR structures. The summary is followed by 
 diagnostics for each residue, normally if the summary looks good it 
 would not be necessary for the editor or referee to look at the rest.
 
 Although this server was intended to help us to improve our structures 
 rather than detect manipulated or fabricated data, I asked it for a 
 report on 2HR0 to see what it would do (probably many other people were 
 trying to do exactly the same, the server was slower than usual). 
 Although the structure got poor marks on most tests, MolProbity 
 generously assigned it overall to the 6th pecentile, I suppose that 
 this is about par for structures submitted to Nature (!). However there 
 was one feature that was unlike anything I have ever seen before 
 although I have fed the MolProbity server with some pretty ropey PDB 
 files in the past: EVERY residue, including EVERY WATER molecule, made 
 either at least one bad contact or was a Ramachandran outlier or was a 
 rotamer outlier (or more than one of these). This surely would ring 
 all the alarm bells!
 
 So I would suggest that the wwPDB could coordinate, with the help of the 
 validation experts, software to produce a short summary report that 
 would be automatically provided in the same email that allocates the PDB 
 ID. This email could make the strong recommendation that the report file 
 be submitted with the publication, and maybe in the fullness of time 
 even the Editors of high profile journals would require this report for 
 the referees (or even read it themselves!). To gain acceptance for such 
 a procedure the report would have to be short and comprehensible to 
 non-crystallographers; the MolProbity summary is an excellent first 
 pass in this respect, but (partially with a view to detecting 
 manipulation of the data) a couple of tests could be added based on the 
 data statistics as reported in the PDB file or even better the 
 reflection data if submitted). Most of the necessary software already 
 exists, much of it produced by regular readers of this bb, it just needs 
 to be adapted so that the results can be digested by referees and 
 editors with little or no crystallographic experience. And most important, 
 a PDB ID should

Re: [ccp4bb] The importance of USING our validation tools

2007-08-19 Thread Boaz Shaanan

Curiously enough, when I've recently submitted a coordinates file to RCSB with 
this Molprobity summary (as remark 42; it is added onto the analyzed file by 
the Molprobity program) it was deleted by the RCSB team.

  Boaz

- Original Message -
From: George M. Sheldrick [EMAIL PROTECTED]
Date: Saturday, August 18, 2007 15:27
Subject: Re: [ccp4bb] The importance of USING our validation tools
To: CCP4BB@JISCMAIL.AC.UK

 There are good reasons for preserving frames, but most of all 
 for the 
 crystals that appeared to diffract but did not lead to a 
 successful 
 structure solution, publication, and PDB deposition. Maybe in 
 the future 
 there will be improved data processing software (for example to 
 integrate 
 non-merohedral twins) that will enable good structures to be 
 obtained from 
 such data. At the moment most such data is thrown away. However, 
 forcing 
 everyone to deposit their frames each time they deposit a 
 structure with 
 the PDB would be a thorough nuisance and major logistic hassle.

 It is also a complete illusion to believe that the reviewers for 
 Nature 
 etc. would process or even look at frames, even if they could 
 download 
 them with the manuscript. 

 For small molecules, many journals require an 'ORTEP plot' to be 
 submitted 
 with the paper. As older readers who have experienced Dick 
 Harlow's 'ORTEP 
 of the year' competition at ACA Meetings will remember, even a 
 viewer 
 with little experience of small-molecule crystallography can see 
 from the 
 ORTEP plot within seconds if something is seriously wrong, and 
 many 
 non-crystallographic referees for e.g. the journal Inorganic 
 Chemistry 
 can even make a good guess as to what is wrong (e.g wrong 
 element assigned 
 to an atom). It would be nice if we could find something similar 
 for 
 macromolecules that the author would have to submit with the 
 paper. One 
 immediate bonus is that the authors would look at it carefully 
 themselves before submitting, which could lead to an improvement 
 of the 
 quality of structures being submitted. My suggestion is that the 
 wwPDB 
 might provide say a one-page diagnostic summary when they 
 allocate each 
 PDB ID that could be used for this purpose.

 A good first pass at this would be the output that the 
 MolProbity server 
 http://molprobity.biochem.duke.edu/ sends when is given a PDB 
 file. It 
 starts with a few lines of summary in which bad things are 
 marked red 
 and the structure is assigned to a pecentile: a percentile of 6% 
 means 
 that 93% of the sturcture in the PDB with a similar resolution 
 are 
 'better' and 5% are 'worse'. This summary can be understood with 
 very 
 little crystallographic background and a similar summary can 
 of course be produced for NMR structures. The summary is 
 followed by 
 diagnostics for each residue, normally if the summary looks good 
 it 
 would not be necessary for the editor or referee to look at the rest.

 Although this server was intended to help us to improve our 
 structures 
 rather than detect manipulated or fabricated data, I asked it 
 for a 
 report on 2HR0 to see what it would do (probably many other 
 people were 
 trying to do exactly the same, the server was slower than 
 usual). 
 Although the structure got poor marks on most tests, MolProbity 
 generously assigned it overall to the 6th pecentile, I suppose 
 that 
 this is about par for structures submitted to Nature (!). 
 However there 
 was one feature that was unlike anything I have ever seen before 
 although I have fed the MolProbity server with some pretty ropey 
 PDB 
 files in the past: EVERY residue, including EVERY WATER 
 molecule, made 
 either at least one bad contact or was a Ramachandran outlier or 
 was a 
 rotamer outlier (or more than one of these). This surely would 
 ring 
 all the alarm bells!

 So I would suggest that the wwPDB could coordinate, with the 
 help of the 
 validation experts, software to produce a short summary report 
 that 
 would be automatically provided in the same email that allocates 
 the PDB 
 ID. This email could make the strong recommendation that the 
 report file 
 be submitted with the publication, and maybe in the fullness of 
 time 
 even the Editors of high profile journals would require this 
 report for 
 the referees (or even read it themselves!). To gain acceptance 
 for such 
 a procedure the report would have to be short and comprehensible 
 to 
 non-crystallographers; the MolProbity summary is an excellent 
 first 
 pass in this respect, but (partially with a view to detecting 
 manipulation of the data) a couple of tests could be added based 
 on the 
 data statistics as reported in the PDB file or even better the 
 reflection data if submitted). Most of the necessary software 
 already 
 exists, much of it produced by regular readers of this bb, it 
 just needs 
 to be adapted so that the results can be digested by referees 
 and 
 editors with little

Re: [ccp4bb] The importance of USING our validation tools

2007-08-19 Thread Alexander Aleshin

I do not think the small molecule approach proposed by George Sheldrick
is sufficient for validation of protein structures, as misrepresentation
of experimental statistics/resolution is hard to detect with it, and
these factors appear to play crucial role in defining the fate of many
hot structures. 

The bad statistics hurts publication more than mistakes in a model, and
improving the experiment is often too hard. I know my structure is
right. Why should I spend another year growing better crystals only to
make the statistics look right? - sounds as a strong argument for a
desperate researcher. Making up an artificial data set overkills the
task. There are easier and less amoral ways such as rejection of
outliers and incorrect assignment of the Rfree test set. Ironically, an
undereducated crystallographer may not recognize wrongdoing in such data
treatment, which makes it even more likely to occur. 

Do I sound paranoid? And please do not suggest that I have shared
personal experiences.


Alex Aleshin


On Sat, 18 Aug 2007, George M. Sheldrick wrote:

 There are good reasons for preserving frames, but most of all for the 
 crystals that appeared to diffract but did not lead to a successful 
 structure solution, publication, and PDB deposition. Maybe in the
future 
 there will be improved data processing software (for example to
integrate 
 non-merohedral twins) that will enable good structures to be obtained
from 
 such data. At the moment most such data is thrown away. However,
forcing 
 everyone to deposit their frames each time they deposit a structure
with 
 the PDB would be a thorough nuisance and major logistic hassle.
 
 It is also a complete illusion to believe that the reviewers for
Nature 
 etc. would process or even look at frames, even if they could download

 them with the manuscript. 
 
 For small molecules, many journals require an 'ORTEP plot' to be
submitted 
 with the paper. As older readers who have experienced Dick Harlow's
'ORTEP 
 of the year' competition at ACA Meetings will remember, even a viewer 
 with little experience of small-molecule crystallography can see from
the 
 ORTEP plot within seconds if something is seriously wrong, and many 
 non-crystallographic referees for e.g. the journal Inorganic Chemistry

 can even make a good guess as to what is wrong (e.g wrong element
assigned 
 to an atom). It would be nice if we could find something similar for 
 macromolecules that the author would have to submit with the paper.
One 
 immediate bonus is that the authors would look at it carefully 
 themselves before submitting, which could lead to an improvement of
the 
 quality of structures being submitted. My suggestion is that the wwPDB

 might provide say a one-page diagnostic summary when they allocate
each 
 PDB ID that could be used for this purpose.
 
 A good first pass at this would be the output that the MolProbity
server 
 http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It

 starts with a few lines of summary in which bad things are marked red 
 and the structure is assigned to a pecentile: a percentile of 6% means

 that 93% of the sturcture in the PDB with a similar resolution are 
 'better' and 5% are 'worse'. This summary can be understood with very 
 little crystallographic background and a similar summary can 
 of course be produced for NMR structures. The summary is followed by 
 diagnostics for each residue, normally if the summary looks good it 
 would not be necessary for the editor or referee to look at the rest.
 
 Although this server was intended to help us to improve our structures

 rather than detect manipulated or fabricated data, I asked it for a 
 report on 2HR0 to see what it would do (probably many other people
were 
 trying to do exactly the same, the server was slower than usual). 
 Although the structure got poor marks on most tests, MolProbity 
 generously assigned it overall to the 6th pecentile, I suppose that 
 this is about par for structures submitted to Nature (!). However
there 
 was one feature that was unlike anything I have ever seen before 
 although I have fed the MolProbity server with some pretty ropey PDB 
 files in the past: EVERY residue, including EVERY WATER molecule, made

 either at least one bad contact or was a Ramachandran outlier or was a

 rotamer outlier (or more than one of these). This surely would ring 
 all the alarm bells!
 
 So I would suggest that the wwPDB could coordinate, with the help of
the 
 validation experts, software to produce a short summary report that 
 would be automatically provided in the same email that allocates the
PDB 
 ID. This email could make the strong recommendation that the report
file 
 be submitted with the publication, and maybe in the fullness of time 
 even the Editors of high profile journals would require this report
for 
 the referees (or even read it themselves!). To gain acceptance for
such 
 a procedure the report would have to be short and comprehensible

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Juergen Bosch


Hi Mischa,

I think you are right with ligand structures and it would be very 
difficult if not impossible to distinguish between real measured data 
and faked data. You just need to run a docking program dock the ligand 
calculate new structure factors add some noise and combine that with 
your real data of the unliganded structure.
I'm not an expert, but how would one be able to detect say a molecule 
which is in the order of 300-600 Da within an average protein of perhaps 
40 kDa if it's true data or faked + noise ?


In Germany we have to keep data (data meaning everything, from clones, 
scans of gels, sizing profiles to xray diffraction images etc.) for 10 
years. Not sure how this is in US.


Juergen

Mischa Machius wrote:

I agree. However, I am personally not so much worried about entire protein 
structures being wrong or fabricated. I am much more worried about 
co-crystal structures. Capturing a binding partner, a reaction 
intermediate or a substrate in an active site is often as spectacular an 
achievement as determining a novel membrane protein structure. The 
threshold for over-interpreting densities for ligands is rather low, and 
wishful thinking can turn into model bias much more easily than for a 
protein structure alone; not to mention making honest mistakes. 

Just for plain and basic scientific purposes, it would be helpful every 
now and then to have access to the orginal images. 

As to the matter of fabricating ligand densities, I surmise, that is much 
easier than fabricating entire protein structures. The potential rewards 
(in terms of high-profile publications and obtaining grants) are just as 
high. There is enough incentive to apply lax scientific standards. 

If a simple means exists, beyond what is available today, that can help 
tremendously in identifying honest mistakes, and perhaps a rare 
fabrication, I think it should seriously be considered. 

Best - MM 





On Sat, 18 Aug 2007, George M. Sheldrick wrote: 

 

There are good reasons for preserving frames, but most of all for the 
crystals that appeared to diffract but did not lead to a successful 
structure solution, publication, and PDB deposition. Maybe in the future 
there will be improved data processing software (for example to integrate 
non-merohedral twins) that will enable good structures to be obtained from 
such data. At the moment most such data is thrown away. However, forcing 
everyone to deposit their frames each time they deposit a structure with 
the PDB would be a thorough nuisance and major logistic hassle. 

It is also a complete illusion to believe that the reviewers for Nature 
etc. would process or even look at frames, even if they could download 
them with the manuscript. 

For small molecules, many journals require an 'ORTEP plot' to be submitted 
with the paper. As older readers who have experienced Dick Harlow's 'ORTEP 
of the year' competition at ACA Meetings will remember, even a viewer 
with little experience of small-molecule crystallography can see from the 
ORTEP plot within seconds if something is seriously wrong, and many 
non-crystallographic referees for e.g. the journal Inorganic Chemistry 
can even make a good guess as to what is wrong (e.g wrong element assigned 
to an atom). It would be nice if we could find something similar for 
macromolecules that the author would have to submit with the paper. One 
immediate bonus is that the authors would look at it carefully 
themselves before submitting, which could lead to an improvement of the 
quality of structures being submitted. My suggestion is that the wwPDB 
might provide say a one-page diagnostic summary when they allocate each 
PDB ID that could be used for this purpose. 

A good first pass at this would be the output that the MolProbity server 
http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It 
starts with a few lines of summary in which bad things are marked red 
and the structure is assigned to a pecentile: a percentile of 6% means 
that 93% of the sturcture in the PDB with a similar resolution are 
'better' and 5% are 'worse'. This summary can be understood with very 
little crystallographic background and a similar summary can 
of course be produced for NMR structures. The summary is followed by 
diagnostics for each residue, normally if the summary looks good it 
would not be necessary for the editor or referee to look at the rest. 

Although this server was intended to help us to improve our structures 
rather than detect manipulated or fabricated data, I asked it for a 
report on 2HR0 to see what it would do (probably many other people were 
trying to do exactly the same, the server was slower than usual). 
Although the structure got poor marks on most tests, MolProbity 
generously assigned it overall to the 6th pecentile, I suppose that 
this is about par for structures submitted to Nature (!). However there 
was one feature that was unlike anything I have ever seen before 
although I have fed

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Thomas Stout


To complete your analogy to the ORTEP of the year, the summary page could be 
accompanied by a backbone ribbon drawing of the macromolecule, with a red 
sphere at each residue that has an error.  You could get fancy and scale the 
sphere according to the severity of the error.

-Tom

-Original Message-
From: CCP4 bulletin board on behalf of George M. Sheldrick
Sent: Sat 8/18/2007 6:26 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools
 
There are good reasons for preserving frames, but most of all for the 
crystals that appeared to diffract but did not lead to a successful 
structure solution, publication, and PDB deposition. Maybe in the future 
there will be improved data processing software (for example to integrate 
non-merohedral twins) that will enable good structures to be obtained from 
such data. At the moment most such data is thrown away. However, forcing 
everyone to deposit their frames each time they deposit a structure with 
the PDB would be a thorough nuisance and major logistic hassle.

It is also a complete illusion to believe that the reviewers for Nature 
etc. would process or even look at frames, even if they could download 
them with the manuscript. 

For small molecules, many journals require an 'ORTEP plot' to be submitted 
with the paper. As older readers who have experienced Dick Harlow's 'ORTEP 
of the year' competition at ACA Meetings will remember, even a viewer 
with little experience of small-molecule crystallography can see from the 
ORTEP plot within seconds if something is seriously wrong, and many 
non-crystallographic referees for e.g. the journal Inorganic Chemistry 
can even make a good guess as to what is wrong (e.g wrong element assigned 
to an atom). It would be nice if we could find something similar for 
macromolecules that the author would have to submit with the paper. One 
immediate bonus is that the authors would look at it carefully 
themselves before submitting, which could lead to an improvement of the 
quality of structures being submitted. My suggestion is that the wwPDB 
might provide say a one-page diagnostic summary when they allocate each 
PDB ID that could be used for this purpose.

A good first pass at this would be the output that the MolProbity server 
http://molprobity.biochem.duke.edu/ sends when is given a PDB file. It 
starts with a few lines of summary in which bad things are marked red 
and the structure is assigned to a pecentile: a percentile of 6% means 
that 93% of the sturcture in the PDB with a similar resolution are 
'better' and 5% are 'worse'. This summary can be understood with very 
little crystallographic background and a similar summary can 
of course be produced for NMR structures. The summary is followed by 
diagnostics for each residue, normally if the summary looks good it 
would not be necessary for the editor or referee to look at the rest.

Although this server was intended to help us to improve our structures 
rather than detect manipulated or fabricated data, I asked it for a 
report on 2HR0 to see what it would do (probably many other people were 
trying to do exactly the same, the server was slower than usual). 
Although the structure got poor marks on most tests, MolProbity 
generously assigned it overall to the 6th pecentile, I suppose that 
this is about par for structures submitted to Nature (!). However there 
was one feature that was unlike anything I have ever seen before 
although I have fed the MolProbity server with some pretty ropey PDB 
files in the past: EVERY residue, including EVERY WATER molecule, made 
either at least one bad contact or was a Ramachandran outlier or was a 
rotamer outlier (or more than one of these). This surely would ring 
all the alarm bells!

So I would suggest that the wwPDB could coordinate, with the help of the 
validation experts, software to produce a short summary report that 
would be automatically provided in the same email that allocates the PDB 
ID. This email could make the strong recommendation that the report file 
be submitted with the publication, and maybe in the fullness of time 
even the Editors of high profile journals would require this report for 
the referees (or even read it themselves!). To gain acceptance for such 
a procedure the report would have to be short and comprehensible to 
non-crystallographers; the MolProbity summary is an excellent first 
pass in this respect, but (partially with a view to detecting 
manipulation of the data) a couple of tests could be added based on the 
data statistics as reported in the PDB file or even better the 
reflection data if submitted). Most of the necessary software already 
exists, much of it produced by regular readers of this bb, it just needs 
to be adapted so that the results can be digested by referees and 
editors with little or no crystallographic experience. And most important, 
a PDB ID should always be released only in combination

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Artem Evdokimov

The literature already contains quite a few papers discussing ligand-protein
interactions derived from low-resolution data, noisy data, etc. It's
relatively easy to take a low-quality map; dock the molecule willy-nilly
into the poorly defined 'blobule' of density, and derive spectacular
conclusions. However, in order for such conclusions to be credible one needs
to support them with orthogonal data such as biological assay results,
mutagenesis, etc. This is not limited to crystallography as such, and it's
the referee's job to be thorough in such cases. To the author's credit, in
*most* cases the questionable crystallographic data is supported by
biological data of high quality. So, even with the images, etc. - it's still
quite possible to be honestly mislead. Which is why we value biological
data.

Consequently, if one's conclusions are wrong - this will inevitably show up
later in the results of other experiments (such as SAR inconsistencies for
example). Science tends to be self-correcting - our errors (whether honest
or malicious) are not going to withstand the test of time.

Assuming that the proportion of deliberate faking in scientific literature
is quite small (and we really have no reason to think otherwise!), I really
see no reason to worry too much about the ligand-protein interactions. Any
referee evaluating ligand-based structural papers can ask to see an omit map
(or a difference density map before any ligand was built) and a decent
biological data set supporting the structural conclusions. In the case of
*sophisticated deliberate faking*, there is not much a reviewer can do
except trying to actually reproduce the claimed results.

On the other hand, the 'wholesale' errors can be harder to catch, since the
dataset and the resulting structure are typically the *only* evidence
available. If both are suspect, the reviewer needs to rely on something else
to make a judgement, which is where a one-page summary would come handy.

Artem

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Juergen Bosch
Sent: Saturday, August 18, 2007 12:20 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

Hi Mischa,

I think you are right with ligand structures and it would be very 
difficult if not impossible to distinguish between real measured data 
and faked data. You just need to run a docking program dock the ligand 
calculate new structure factors add some noise and combine that with 
your real data of the unliganded structure.
I'm not an expert, but how would one be able to detect say a molecule 
which is in the order of 300-600 Da within an average protein of perhaps 
40 kDa if it's true data or faked + noise ?

In Germany we have to keep data (data meaning everything, from clones, 
scans of gels, sizing profiles to xray diffraction images etc.) for 10 
years. Not sure how this is in US.

Juergen
 
Mischa Machius wrote:

I agree. However, I am personally not so much worried about entire protein 
structures being wrong or fabricated. I am much more worried about 
co-crystal structures. Capturing a binding partner, a reaction 
intermediate or a substrate in an active site is often as spectacular an 
achievement as determining a novel membrane protein structure. The 
threshold for over-interpreting densities for ligands is rather low, and 
wishful thinking can turn into model bias much more easily than for a 
protein structure alone; not to mention making honest mistakes. 

Just for plain and basic scientific purposes, it would be helpful every 
now and then to have access to the orginal images. 

As to the matter of fabricating ligand densities, I surmise, that is much 
easier than fabricating entire protein structures. The potential rewards 
(in terms of high-profile publications and obtaining grants) are just as 
high. There is enough incentive to apply lax scientific standards. 

If a simple means exists, beyond what is available today, that can help 
tremendously in identifying honest mistakes, and perhaps a rare 
fabrication, I think it should seriously be considered. 

Best - MM 




On Sat, 18 Aug 2007, George M. Sheldrick wrote: 

  

There are good reasons for preserving frames, but most of all for the 
crystals that appeared to diffract but did not lead to a successful 
structure solution, publication, and PDB deposition. Maybe in the future 
there will be improved data processing software (for example to integrate 
non-merohedral twins) that will enable good structures to be obtained from

such data. At the moment most such data is thrown away. However, forcing 
everyone to deposit their frames each time they deposit a structure with 
the PDB would be a thorough nuisance and major logistic hassle. 

It is also a complete illusion to believe that the reviewers for Nature 
etc. would process or even look at frames, even if they could download 
them with the manuscript. 

For small molecules, many journals require

Re: [ccp4bb] The importance of USING our validation tools

2007-08-18 Thread Lisa A Nagy

Dear all,
I agree with MM about the ligand and complex structures. Even in the
most honest circumstances, it is easy to get carried away with hopes and
excitement. My personal embarassing experience was some years ago. It
involved a protein that I had crystallized in a different space group in
the presence of inhibitor- 2.5A data. The MR model had some gaps a
moderate distance from the binding pocket. Lo and behold, some new, very
rough  density appeared very very close to a binding site- close enough
to get my hopes up. I communicated my elation to the PI, handed over
pictures of the rough blobs of density, and started trying to build the
ligand in. 

I should have moderated my emotions in light of the early state of the
refinement. After finding a somewhat plausible fit in the density, I ran
several rounds of the Wonderful Amazing Revealer of Proteindensity
program. By the end I was almost in tears. The difference density began
to take on a helical shape, and then the connections started growing,
leading all the way up to one of the gaps. Side chains too, so I had no
trouble with the register. The R-factors didn't change too much, but the
geometries and maps in the area started looking really nice. Or should I
say, proper.

Very nice silver platter (that my head was on when it was handed it back
to me).

Lisa

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread George M. Sheldrick


Dominika is entirely correct, the F and (especially) sigma(F) values 
are clearly inconsistent with my naive suggestion that columns could 
have been swapped accidentally in an mtz file. 

George

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582


On Thu, 16 Aug 2007, Dominika Borek wrote:

 There are several issues under current discussion. We outline a few of
 these below, in order of importance.
 
 The structure 2hr0 is unambiguously fake. Valid arguments have already been
 published in a Brief Communication by Janssen et. al (Nature, 448:E1-E2, 9
 August 2007). However, the published response from the authors of the
 questioned deposit may sound to unfamiliar person as an issue of a
 scientific controversy. There are many additional independent signs of
 intentional data fabrication in this case, above and beyond those already
 mentioned.
 
 One diagnostic is related to the fact that fabricating data will not show
 proper features of proteins with respect to disorder. The reported case has
 a very high ratio of “Fobs”/atom parameters, thus the phase uncertainty is
 small. In real structures fully solvent exposed chains without stabilizing
 interactions display intrinsically high disorder, yet in this structure
 these residues (e.g., Arg932B, Met1325B, Glu1138B, Arg459A, etc.) are
 impossibly well ordered.
 
 The second set of diagnostics is the observation of perfect electron
 density around impossible geometries. For example, the electron density is
 perfect (visible even at the 4 sigma level in a 2Fo-Fc map) with no
 significant negative peaks in an Fo-Fc map around the guanidinium group of
 Arg1112B, which is in an outrageously close contact to carbon atoms of
 Lys1117B. This observation appears in many other places in the map as well.
 The issue is not the presence of bad contacts, but the lack of disorder
 (high B-factors) or negative peaks in an Fo-Fc map in this region that
 could explain why the bad contacts remain in the model.
 
 The third set of diagnostics are statistics that do not occur in real
 structures. The ones mentioned previously are already very convincing
 (moments, B-factor plots, bulk solvent issues, etc.). We can add more
 evidence from a round of Refmac refinement of the deposited model versus
 the deposited structure factors. The anisotropic scaling factor obtained,
 which for a structure in a low symmetry space group such as C2 that has an
 inherent lack of constraint in packing symmetry, is unreasonable
 (particularly in view of the problems with lattice contacts already
 mentioned). The values from a Refmac refinement for a typical structure in
 space group C2 are: B11 =  0.72 B22 =  1.15 B33 = -2.12 B12 =  0.00 B13 =
 -1.40 B23 =  0.00 (B12 and B23 are zero due to C2 space group symmetry).
 For structure 2hr0:  B11 = -0.02 B22 =  0.00 B33 =  0.02 B12 =  0.00 B13 =
 0.01 B23 =  0.00. Statistical reasoning can lead to P-values in the range
 of 10exp(-6) for such values to be produced by chance in a real structure,
 but they are highly likely in a fabricated case.
 
 The fourth set of diagnostics are significant inconsistencies in published
 methods, e.g. the authors claim that they collected data from four
 crystals, yet their data merging statistics show an R-merge = 0.11 in the
 last resolution shell. It is simply impossible to get such values
 particularly when I/sigma(I) for the last resolution shell was stated as
 1.32. Moreover, the overall I/sigma(I) for all data is 5.36 and the overall
 R-merge is 0.07 – values highly inconsistent with the reported data
 resolution, quality of map and high data completeness (97.3%).
 
 Overall this is just a short list of problems, the indicators of data
 fabrication/falsification are plentiful and if needed can be easily
 provided to interested parties.
 
 We fully support Randy Read's excellent comments with our view of
 retraction and public discussion of this problem:
 
 “Originally I expected that the publication of our Brief Communication in
 Nature would stimulate a lot of discussion on the bulletin board, but
 clearly it hasn't. One reason is probably that we couldn't be as forthright
 as we wished to be. For its own good reasons, Nature did not allow us to
 use the word fabricated. Nor were we allowed to discuss other structures
 from the same group, if they weren't published in Nature.”
 
 One needs to address this policy with publishers in cases of intentional
 fraud that can be proven simply by an analysis of the published results. At
 this point the article needs to be retracted by Nature after Nature's
 internal investigation with input from crystallographic community rather
 then after obtaining results of any potential administrative investigation
 of fraud.
 
 “Another reason is an understandable reluctance to make allegations in
 public, and the CCP4 bulletin board probably isn't the

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Shekhar C. Mande

While the topic of fabrication is still hot, I thought I too could add 
a few thoughts. 

Our Mathematician friends always make fun of us (Biologists/ Biochemists/ 
crystallographers!) that our papers are accepted within 4-8 weeks of 
submission. 
This is not to talk of Science/ Nature/ Cell, where even more rapid reviews 
are the norms. In the Mathematics world it is customary to have one year 
review of manuscripts, and prior announcements of the work on respective 
web sites. The one year review, and the prior announcements on web sites, 
allows others to review the results independently. That perhaps brings 
in the required rigor in the results. Consequently, there are not as many 
retractions in Mathematics as what we see in our area. It is perhaps not 
possible in our (crystallographic) World to check every strcture independently 
by others. Yet, longer review along with access to raw data might allow 
reviewers to check the finer details of the structures. I would strongly 
suggest that raw data be made available to reviewers, and that reviewers 
should check the structures before the papers are accepted. Any error in 
the final published structures, blame should also lie partially on the 
reviewer. The back-to-back controversies are bound to hurt crystallogrophic 
community as a whole, and IUCr should ponder over to better checks for 
the future. 

Shekhar Mande
Hyderabad, INDIA
-REPLY TO-
Date:Thu Aug 16 21:22:20 GMT+08:00 2007
FROM: Randy J. Read  [EMAIL PROTECTED]
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools
On Aug 16 2007, Eleanor Dodson wrote:

The weighting in REFMAC is a function of SigmA ( plotted in log file). 
For this example it will be nearly 1 for all resolutions ranges so the 
weights are pretty constant. There is also a contribution from the 
experimental sigma, which in this case seems to be proportional to |F| 

Originally I expected that the publication of our Brief Communication in 
Nature would stimulate a lot of discussion on the bulletin board, but 
clearly it hasn't. One reason is probably that we couldn't be as forthright 
as we wished to be. For its own good reasons, Nature did not allow us to 
use the word fabricated. Nor were we allowed to discuss other structures 
from the same group, if they weren't published in Nature.

Another reason is an understandable reluctance to make allegations in 
public, and the CCP4 bulletin board probably isn't the best place to do 
that.

But I think the case raises essential topics for the community to discuss, 
and this is a good forum for those discussions. We need to consider how 
to 
ensure the integrity of the structural databases and the associated 
publications.

So here are some questions to start a discussion, with some suggestions 
of 
partial answers.

1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is very small. 

2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious mind, 
but 
it's extremely difficult to do well. No matter how well a structure is 
fabricated, it will violate something that is known now or learned later 
about the properties of real macromolecules and their diffraction data. 
If 
you're clever enough to do this really well, then you should be clever 
enough to determine the real structure of an interesting protein.

3. How can we tell whether structures in the PDB are fabricated, or just 
poorly refined?

The current standard validation tools are aimed at detecting errors in 
structure determination or the effects of poor refinement practice. None 
of 
them are aimed at detecting specific signs of fabrication because we assume 
(almost always correctly) that others are acting in good faith.

The more information that is available, the easier it will be to detect 
fabrication (because it is harder to make up more information 
convincingly). For instance, if the diffraction data are deposited, we 
can 
check for consistency with the known properties of real macromolecular 
crystals, e.g. that they contain disordered solvent and not vacuum. As 
Tassos Perrakis has discovered, there are characteristic ways in which 
the 
standard deviations depend on the intensities and the resolution. If 
unmerged data are deposited, there will probably be evidence of radiation 
damage, weak effects from intrinsic anomalous scatterers, etc. Raw images 
are probably even harder to simulate convincingly.

If a structure is fabricated by making up a new crystal form, perhaps a 
complex of previously-known components, then the crystal packing 
interactions should look like the interactions seen in real crystals. If 
it's fabricated by homology modelling, then the internal packing is likely 
to be suboptimal. I'm told by David Baker (who knows a thing or two about 
this) that it is extremely difficult to make a homology model that both 
obeys what

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Winter, G (Graeme)

Storing all the images *is* expensive but it can be done - the JCSG do
this and make available a good chunk of their raw diffraction data. The
cost is, however, in preparing this to make the data useful for the
person who downloads it.

If we are going to store and publish the raw experimental measurements
(e.g. the images) which I think would be spectacular, we will also need
to define a minimum amount of metadata which should be supplied with
this to allow a reasonable chance of reproduction of the results. This
is clearly not trivial, but there is probably enough information in the
harvest and log files from e.g. CCP4, HKL2000, Phenix to allow this.

The real problem will be in getting people to dig out that tape / dvd
with the images on, prepare the required metadata and deposit this
information somewhere. Actually storing it is a smaller challenge,
though this is a long way from being trivial.

On an aside - firewire disks are indeed a very cheap way of storing the
data. There is a good reason why they are much cheaper than the
equivalent RAID array. They fail. Ever lost 500GB of data in one go?
Ouch. ;o)

Just MHO.

Cheers,

Graeme 

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Phil Evans
Sent: 16 August 2007 15:13
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools

What do you count as raw data? Rawest are the images - everything beyond
that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:

 Dear Randy

 These are very valid points, and I'm so glad you've taken the 
 important step of initiating this. For now I'd like to respond to one 
 of them, as it concerns something I and colleagues in Australia are 
 doing:

 The more information that is available, the easier it will be to 
 detect fabrication (because it is harder to make up more information 
 convincingly). For instance, if the diffraction data are deposited, 
 we can check for consistency with the known properties of real 
 macromolecular crystals, e.g. that they contain disordered solvent 
 and not vacuum. As Tassos Perrakis has discovered, there are 
 characteristic ways in which the standard deviations depend on the 
 intensities and the resolution. If unmerged data are deposited, there

 will probably be evidence of radiation damage, weak effects from 
 intrinsic anomalous scatterers, etc. Raw images are probably even 
 harder to simulate convincingly.

 After the recent Science retractions we realised that its about time 
 raw data was made available. So, we have set about creating the 
 necessary IT and software to do this for our diffraction data, and are

 encouraging Australian colleagues to do the same. We are about a week 
 away from launching a web-accessible repository for our recently 
 published (eg deposited in PDB) data, and this should coincide with an

 upcoming publication describing a new structure from our labs. The aim

 is that publication occurs simultaneously with release in PDB as well 
 as raw diffraction data on our website.
 We hope to house as much of our data as possible, as well as data from

 other Australian labs, but obviously the potential dataset will be 
 huge, so we are trying to develop, and make available freely to the 
 community, software tools that allow others to easily setup their own 
 repositories.  After brief discussion with PDB the plan is that PDB 
 include links from coordinates/SF's to the raw data using a simple 
 handle that can be incorporated into a URL.  We would hope that we can

 convince the journals that raw data must be made available at the time

 of publication, in the same way as coordinates and structure factors.

 Of course, we realise that there will be many hurdles along the way 
 but we are convinced that simply making the raw data available ASAP is

 a 'good thing'.

 We are happy to share more details of our IT plans with the CCP4BB, 
 such that they can be improved, and look forward to hearing feedback

 cheers

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Manuel Than


Dear colleagues,

	the recent discussion on the necessity and feasibility of storing raw 
data for all our structures raises a second point, I think. For the 
current discussion it is only a matter of storage place that has to be 
assigned somehow to make fobs, unmerged data, or raw images available to 
everybody who want's to download, but there are other science fields out 
there as well. Do we want to collect also gels, plots, plasmids, 
bacterial strains, mice, dollies,  at some central place? Or should 
rather the scientific ethics bind all of us to practice good science and 
to be an objective reviewer when asked?


	The usefulness for software developers and future experiments with our 
data is a completely different issue of course.


Just wanting to raise this point.

Manuel Than

--
**
Dr. Manuel E. Than

Protein Crystallography Group
Leibniz Institute for Age Research -
Fritz Lipmann Institute (FLI)
Beutenbergstraße 11
D-07745 Jena
Germany

Tel.: ++49 3641 65 6170
Fax.: ++49 3641 65 6335

e-mail: [EMAIL PROTECTED]
http://www.fli-leibniz.de/groups/than.php

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Clemens Vonrhein

Hi Matrin,

On Fri, Aug 17, 2007 at 11:09:28AM +0200, Martin Walsh wrote:
 For 2006 at BM14 we and our users generated 266997 images/frames from our
 MAR225 CCD (18mb files) or in other words ~4.8Tbyte (if you have patience to
 do so then bzip2 will reduce these raw images to between 5.5 and 7Mb
 -depending on how many diffraction spots /image)

Looking at

  http://www.esrf.eu/exp_facilities/BM14/publications/publications-new.html

it seems that 56 papers have been published in 2006 using BM14 data
(directly). Lets say (for arguments sake) that each paper deposited 2
structures (and structure factors) into the PDB: this would mean about
2400 images/frames per structure (and about 40 Gb of data per
structurte). There must be a large amount of junk in there not
directly related to the deposited structure factors (images from
screening or test crystals, basically useless crystals etc).

I don't think anyone would want all images from every beamline
deposited in a public database. I think if only the images related to
the deposited structure factors are deposited, the data from BM14
would be at least a factor of 10 smaller (4Gb or 240 images per
dataset). So this would mean 480 Gb of BM14 data for 2006 - or 54Tb
for all 115 PX beamlines ... if they all would be as productive as
BM14! Anyway, compared to astronomy and other fields it is fairly
small (as Peter Keller mentioned in his post).

If we think it is necessary (and I think we should) it will need to be
done. It doesn't need to be perfect - but compared to e.g. the
currently deposited structure factors, at least diffraction images
have headers with useful information in them (even if the beam-centre,
distance or wavelength etc are often wrong: but there are ways of
getting at the correct values ... even if it is by trial and
error).

Cheers

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Anastassis Perrakis


On Aug 17, 2007, at 8:36, George M. Sheldrick wrote:


Dominika is entirely correct, the F and (especially) sigma(F) values
are clearly inconsistent with my naive suggestion that columns could
have been swapped accidentally in an mtz file.


Since the sigma(f) issue has been raised, let me elaborate on that.

Faking observations is difficult. Faking the experimental  
uncertainties is even more difficult.
If one would fake a dataset, there would almost always be an implicit  
imprint of the procedure.


I am told for example that some journals now use a company that  
claims they can see gels and
pictures that were 'photo-shopped'. That is - i am told by friends -  
the reason that some journals
ask for 400 dpi pictures, while the Nature printers can do about 120  
dpi in real life.


Thus, I analyzed the distribution of the experimental sigmas in three  
structures:

1E3M and two structures of mine at the same resolution (1CTN, 1E3M)

The results are in:

http://xtal.nki.nl/nature-debate/


Thats also a response to Tom Hurley's email ... I think we are  
obliged to look at this case and
show to all crystallographers that read the board what the evidence  
are. This has no lawful consequences.
I think the debate is healthy and I have not seen anyone asking to  
lynch or crucify anybody.
As long as the discussion is about evidence and not passing ethical  
or other judgement, I think its good

to go on. Also its a good lesson for everybody to learn:

 
===
*** Keep your images, you gels, your logbooks. Its your obligation.  
Make sure all your colleagues do so.
 
===


(especially if you are the PI you carry the primary responsibility  
for all primary data that support your publication to be available on  
request)
If you do not keep to that principle, some mean mob might lynch you,  
even if you are right. So, be correct in your approaches.


I am making the web site public with my analysis for people to see  
one more evidence that there are
doubts and Murthy et al should provide primary data, as many others  
have said. Statements of certain innocence

or certain guilt, should indeed not be public.

So, i will wait now for the data - as simple as that.

Tassos

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread Ronald E Stenkamp

While all of the comments on this situation have been entertaining, I've been 
most impressed by comments from Bill Scott, Gerard Bricogne and Kim Hendricks.

I think due process is called for in considering problem structures that may or
may not be fabricated.  Public discussion of technical or craftsmanship issues
is fine, but questions of intent, etc are best discussed in private or in more 
formal settings.  We owe that to all involved.

Gerard's comments concerning publishing in journals/magazines like Nature and
Science are correct.  The pressure to publish there is not consistent with
careful, well-documented science.  For many years, we've been teaching our 
graduate students about some of the problems with short papers in those types 
of journals.  The space limitations and the need for relevance force omission 
of important details, so it's very hard to judge the merit of those papers. 
But, don't assume that other real journals do much better with this.  There's 
a lot of non-reproducible science in the journals.  Much of it comes from not 
recognizing or reporting important experimental or computational details, but 
some of it is probably simply false.

Kim's comments about the technical aspects of archiving data make a lot of
sense to me.  The costs of making safe and secure archives are not
insignificant.  And we need to ask if the added value of such archives is worth
the added costs.  I'm not yet convinced of this.

The comments about Richard Reid, shoes, and air-travel are absolutely true.  We
should be very careful about requiring yet more information for submitted
manuscripts.  Publishing a paper is becoming more and more like trying to get
through a crowded air-terminal.  Every time you turn around, there's another
requirement for some additional detail about your work.  In the vast majority
of cases, those details won't matter at all.  In a few cases, a very careful
and conscious referee might figure out something significant based on that
little detail.  But is the inconvenience for most us worth that little benefit?

Clearly, enough information was available to Read, et al. for making the case 
that the original structure has problems.  What evidence is there that 
additional data, like raw data images, would have made any difference to the 
original referees and reviewers?  Refereeing is a human endeavor of great 
importance, but it is not going to be error-free.  And nothing can make it 
error-free.  You simply need to trust that people will be honest and do the 
best job possible in reviewing things.  And that errors that make it through 
the process and are deemed important enough will be corrected by the next layer 
of reviewers.

I believe this current episode, just like those in the past, are terrific 
indicators that our science is strong and functioning well.  If other fields 
aren't reporting and correcting problems like these, maybe it's because they 
simply haven't found them yet.  That statement might be a sign of my 
crystallographic arrogance, but it might also be true.

Ron Stenkamp

Re: [ccp4bb] The importance of USING our validation tools

2007-08-17 Thread James Stroud

It seems that a public discussion with points and counterpoints 
presented openly and fairly is in complete adherence to the ideals of 
due process. Since this discussion is not deciding the criminal fate of 
any individual, it does not seem necessary to defer it to any political 
government. Also, were any criminal charges ever brought forth, one 
might think an innocent defendent would appreciate the benefit of the 
world's experts pondering the facts in an open forum.


James


William Scott wrote:
But I agree, it is important to keep in mind that the proper venue for 
determining guilt or innocence in the case of fraud is the court system.


Until fairly recently, the idea of presumed innocence and the right to 
cross-examine accusers and witnesses has been considered fundamental to 
civil society.


The case certainly sounds compelling, but this is all the more reason to 
adhere to these ideals.




--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Eleanor Dodson


I believe that is so.
In this case the Rfactor against the deposited data is low. The question 
to be addressed is whether the deposited data is of acceptable quality.


There are some poor distances but not many - the asymmetric unit is very 
empty.
The Ramachandran plot is not good, and an author would be queried about 
that. However you can choose to ignore their warnings.


Eleanor
Gina Clayton wrote:
I thought that when a structure is deposited the databank does run its 
own
refinement validation and geometry checks and gives you back what it 
finds i.e

distance problems etc and rfactor?


Quoting Eleanor Dodson [EMAIL PROTECTED]:


The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so 
the weights are pretty constant. There is also a contribution from 
the experimental sigma, which in this case seems to be proportional 
to |F|


Yesterday I attached the wrong TRUNCATE log file - here is the 
correct one, and if you look at the plot
Amplitude Analysis against resolution it also includes a plot of 
F SigF


Eleanor

Dominika Borek wrote:
There are many more interesting things about this structure - 
obvious fake - refined against fabricated data.


After running refmac I have noticed discrepancies between R and 
weighted R-factors. However, I do not know how the weights are 
calculated and applied - it could maybe help to find out how these 
data were created. Could you help?


M(4SSQ/LL) NR_used %_obs M(Fo_used) M(Fc_used) Rf_used WR_used
NR_free M(Fo_free) M(Fc_free) Rf_free   WR_free $$
$$
 0.0052205  98.77  3800.5  3687.2  0.12  0.30 121 4133.9  
4042.7  0.12  0.28
 0.0153952  99.90  1932.9  1858.7  0.20  0.60 197 2010.5  
1880.5  0.21  0.40
 0.0255026  99.81  1577.9  1512.3  0.23  0.62 283 1565.0  
1484.6  0.26  0.54
 0.0345988  99.76  1598.0  1541.5  0.23  0.61 307 1625.7  
1555.6  0.23  0.42
 0.0446751  99.79  1521.2  1481.6  0.18  0.41 338 1550.3  
1523.8  0.18  0.61
 0.0547469  99.81  1314.5  1291.2  0.14  0.29 391 1348.3  
1337.7  0.15  0.27
 0.0648078  99.87  .5  1089.1  0.16  0.36 465 1096.1  
1077.9  0.18  0.42
 0.0738642  99.84   976.7   959.2  0.15  0.32 488  995.3   
988.4  0.16  0.50
 0.0839255  99.88   866.4   848.0  0.16  0.36 490  856.8   
846.0  0.17  0.38
 0.0939778  99.88   747.6   731.4  0.16  0.36 515  772.8   
747.3  0.18  0.38
 0.103   10225  99.86   662.6   649.1  0.17  0.38 547  658.9   
643.6  0.20  0.36
 0.113   10768  99.83   597.2   584.7  0.18  0.42 538  593.4   
590.0  0.20  0.49
 0.122   11121  99.86   535.5   521.9  0.19  0.48 607  556.2   
542.0  0.20  0.47
 0.132   11692  99.85   489.3   479.2  0.19  0.46 607  476.4   
467.3  0.23  0.42
 0.142   11999  99.83   453.9   443.1  0.19  0.48 621  455.3   
440.6  0.22  0.55
 0.152   12463  99.79   419.2   407.3  0.19  0.44 655  435.3   
424.3  0.22  0.53
 0.162   12885  99.78   384.0   373.9  0.20  0.53 632  384.1   
376.1  0.22  0.43
 0.171   12698  95.96   357.2   348.5  0.21  0.57 686  353.9   
338.6  0.24  0.51
 0.181   11926  87.78   332.0   323.3  0.21  0.66 590  333.4   
322.6  0.24  0.57
 0.191   11204  80.39   309.9   299.6  0.22  0.59 600  302.1   
296.3  0.26  0.77

$$




Eleanor Dodson wrote:
There is a correspondence in last weeks Nature commenting on the 
disparities between  three C3B structures. These are:
2icf   solved at 4.0A resolution, 2i07 at 4.1A resolution, and 2hr0 
at 2.26A resolution.


The A chains of all 3 structures agree closely, with each other and 
other deposited structures.
The B chains of 2icf and 2i07 are in reasonable agreement, but 
there are enormous differences to the B chain of 2hr0.
This structure is surprisingly out of step, and by many criteria 
likely to be wrong.


There has been many articles written on validation and it seems 
worth reminding crystallographers

of  some of tests which make 2hr0 suspect.

1) The cell content analysis suggests there is 80% solvent in the 
asymmetric unit.

Such crystals have been observed but they rarely diffract to 2.26A.

2) Data Analysis:
The reflection data has been deposited so it can be analysed.
The plots provided by TRUNCATE showing intensity statistic features 
are not compatible with such a high solvent ratio.   They are too 
perfect; the moments are perfectly linear, unlikely with such large 
volumes of the crystal containing solvent, and there is absolutely 
no evidence of anisotropy, again unlikely with high solvent content.


3)  Structure analysis
a) The Ramachandran plot is very poor ( 84% allowed) with many 
residues in disallowed regions.
b) The distribution of residue B values is quite unrealistic. There 
is a very low spread,  which is most unusual for a structure with 
long stretches of exposed chain.  The baverage log file is attached.


c) There does not seem to be enough contacts to maintain the 
crystalline

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Ashley Buckle

By raw data I mean images. We think this is only manageable using a  
distributed data grid model (eg Universities/institutions setup their  
own repositories using open standards, and PDB aggregate the links to  
them. URL persistence will be a hurdle I admit). You are right in  
that a single-repository solution would be impractical.  We would  
hope that the PDB could store the unmerged intensities.

cheers
ashley

On 17/08/2007, at 12:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous  
scatterers, etc. Raw images are probably even harder to simulate  
convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction data,  
and are encouraging Australian colleagues to do the same. We are  
about a week away from launching a web-accessible repository for  
our recently published (eg deposited in PDB) data, and this should  
coincide with an upcoming publication describing a new structure  
from our labs. The aim is that publication occurs simultaneously  
with release in PDB as well as raw diffraction data on our  
website. We hope to house as much of our data as possible, as well  
as data from other Australian labs, but obviously the potential  
dataset will be huge, so we are trying to develop, and make  
available freely to the community, software tools that allow  
others to easily setup their own repositories.  After brief  
discussion with PDB the plan is that PDB include links from  
coordinates/SF's to the raw data using a simple handle that can be  
incorporated into a URL.  We would hope that we can convince the  
journals that raw data must be made available at the time of  
publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw data  
available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers


*NOTE* My new tel. no: (03) 9902 0269

Ashley Buckle Ph.D
NHMRC Senior Research Fellow
The Department of Biochemistry and Molecular Biology
School of Biomedical Sciences, Faculty of Medicine 
Victorian Bioinformatics Consortium (VBC)
Monash University, Clayton, Vic 3800
Australia

http://www.med.monash.edu.au/biochem/staff/abuckle.html
iChat/AIM: blindcaptaincat
skype: ashley.buckle
Tel: (613) 9902 0269 (office)
Tel: (613) 9905 1653 (lab)

Fax : (613) 9905 4699

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mischa Machius

I don't think archiving images would be that expensive. For one, I  
have found that most formats can be compressed quite substantially  
using simple, standard procedures like bzip2. If optimized, raw  
images won't take up that much space. Also, initially, only those  
images that have been used to obtain phases and to refine finally  
deposited structures could be archived. If the average structure  
takes up 20GB of space, 5,000 structures would be 1TB, which fits on  
a single hard drive for less than $400. If the community thinks this  
is a worthwhile endeavor, money should be available from granting  
agencies to establish a central repository (e.g., at the RCSB).  
Imagine what could be done with as little as $50,000. For large  
detectors, binning could be used, but giving current hard drive  
prices and future developments, that won't be necessary. Best - MM



On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous  
scatterers, etc. Raw images are probably even harder to simulate  
convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction data,  
and are encouraging Australian colleagues to do the same. We are  
about a week away from launching a web-accessible repository for  
our recently published (eg deposited in PDB) data, and this should  
coincide with an upcoming publication describing a new structure  
from our labs. The aim is that publication occurs simultaneously  
with release in PDB as well as raw diffraction data on our  
website. We hope to house as much of our data as possible, as well  
as data from other Australian labs, but obviously the potential  
dataset will be huge, so we are trying to develop, and make  
available freely to the community, software tools that allow  
others to easily setup their own repositories.  After brief  
discussion with PDB the plan is that PDB include links from  
coordinates/SF's to the raw data using a simple handle that can be  
incorporated into a URL.  We would hope that we can convince the  
journals that raw data must be made available at the time of  
publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw data  
available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers



 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mischa Machius

Hmm - I think I miscalculated, by a factor of 100 even!... need more  
coffee. In any case, I still think it would be doable. Best - MM



On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:

I don't think archiving images would be that expensive. For one, I  
have found that most formats can be compressed quite substantially  
using simple, standard procedures like bzip2. If optimized, raw  
images won't take up that much space. Also, initially, only those  
images that have been used to obtain phases and to refine finally  
deposited structures could be archived. If the average structure  
takes up 20GB of space, 5,000 structures would be 1TB, which fits  
on a single hard drive for less than $400. If the community thinks  
this is a worthwhile endeavor, money should be available from  
granting agencies to establish a central repository (e.g., at the  
RCSB). Imagine what could be done with as little as $50,000. For  
large detectors, binning could be used, but giving current hard  
drive prices and future developments, that won't be necessary. Best  
- MM



On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:

What do you count as raw data? Rawest are the images - everything  
beyond that is modellling - but archiving images is _expensive_!  
Unmerged intensities are probably more manageable


Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the  
important step of initiating this. For now I'd like to respond to  
one of them, as it concerns something I and colleagues in  
Australia are doing:


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they  
contain disordered solvent and not vacuum. As Tassos Perrakis  
has discovered, there are characteristic ways in which the  
standard deviations depend on the intensities and the  
resolution. If unmerged data are deposited, there will probably  
be evidence of radiation damage, weak effects from intrinsic  
anomalous scatterers, etc. Raw images are probably even harder  
to simulate convincingly.


After the recent Science retractions we realised that its about  
time raw data was made available. So, we have set about creating  
the necessary IT and software to do this for our diffraction  
data, and are encouraging Australian colleagues to do the same.  
We are about a week away from launching a web-accessible  
repository for our recently published (eg deposited in PDB) data,  
and this should coincide with an upcoming publication describing  
a new structure from our labs. The aim is that publication occurs  
simultaneously with release in PDB as well as raw diffraction  
data on our website. We hope to house as much of our data as  
possible, as well as data from other Australian labs, but  
obviously the potential dataset will be huge, so we are trying to  
develop, and make available freely to the community, software  
tools that allow others to easily setup their own repositories.   
After brief discussion with PDB the plan is that PDB include  
links from coordinates/SF's to the raw data using a simple handle  
that can be incorporated into a URL.  We would hope that we can  
convince the journals that raw data must be made available at the  
time of publication, in the same way as coordinates and structure  
factors.  Of course, we realise that there will be many hurdles  
along the way but we are convinced that simply making the raw  
data available ASAP is a 'good thing'.


We are happy to share more details of our IT plans with the  
CCP4BB, such that they can be improved, and look forward to  
hearing feedback


cheers



-- 
--

Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353



 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Green, Todd

Hello all,

I started to write a response to this thread yesterday. I thought the title was 
great even the content of Eleanor's email was very helpful. What I didn't like 
was the indictment in the next to last paragraph. This has been followed up 
with the word fabrication by others. No one knows definitively if this was 
fabricated. You have your suspicions, but you don't know. Fabrication 
suggests malicious wrong-doing. I actually don't think this was the case. I'm 
probably a bit biased because the work comes from an office down the hall from 
my own. I'd like to think that if the structure is wrong that it could be 
chalked up to inexperience rather than malice. To me, this scenario of 
inexperience seems like one that could become more and more prevalent as our 
field opens up to more and more scientists doing structural work who are not 
dedicated crystallographers.

Having said that, I think Eleanor started an extremely useful thread as a way 
of avoiding the pitfalls of crystallography whether you are a novice or an 
expert. There's no question that this board is the best way to advance one's 
knowledge of crystallography. I actually gave a homework assignment that was 
simply to sign up for the ccp4bb. 

In reference to the previously mentioned work, I'd also like to hear discussion 
concurring or not the response letter some of which seems plausible to me.

I hope I don't ruffle anyones feathers by my email, but I just thought that it 
should be said.

Cheers-
Todd


-Original Message-
From: CCP4 bulletin board on behalf of Randy J. Read
Sent: Thu 8/16/2007 8:22 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools
 
On Aug 16 2007, Eleanor Dodson wrote:

The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so the 
weights are pretty constant. There is also a contribution from the 
experimental sigma, which in this case seems to be proportional to |F|

Originally I expected that the publication of our Brief Communication in 
Nature would stimulate a lot of discussion on the bulletin board, but 
clearly it hasn't. One reason is probably that we couldn't be as forthright 
as we wished to be. For its own good reasons, Nature did not allow us to 
use the word fabricated. Nor were we allowed to discuss other structures 
from the same group, if they weren't published in Nature.

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mark J. van Raaij


Dear all,

With regards to the possible fabrication of the 2hr0 structure, why  
would the authors have deposited the structure factors if this is not  
required by the journal? Also, why would they have fabricated a  
structure with gaps along c if they could have done so without the gap?


I few years ago, I had to cope with two structures with gaps along c,  
pdb codes 1h6w and 1ocy those of you who are interested, structure  
factors are available from the pdb, unmerged intensities/raw images I  
will look for and provide if requested...


Without further evidence, I suspect their structure is real, perhaps  
not optimally refined and treated though, but then again, this seems  
commonplace in Nature structures, perhaps due to lack of time/ 
experience and, in some cases, putting too much pressure on the PhD  
students/postdocs involved instead of mentoring and checking them. I  
hope the authors provide the raw diffraction images to dispel any  
doubts and would be curious to learn about the other structures of  
the same group - anyone has a comprehensive, annotated list of them?


Greetings,

Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/


On 16 Aug 2007, at 15:22, Randy J. Read wrote:


On Aug 16 2007, Eleanor Dodson wrote:

The weighting in REFMAC is a function of SigmA ( plotted in log  
file).
For this example it will be nearly 1 for all resolutions ranges so  
the weights are pretty constant. There is also a contribution from  
the experimental sigma, which in this case seems to be  
proportional to |F|


Originally I expected that the publication of our Brief  
Communication in Nature would stimulate a lot of discussion on the  
bulletin board, but clearly it hasn't. One reason is probably that  
we couldn't be as forthright as we wished to be. For its own good  
reasons, Nature did not allow us to use the word fabricated. Nor  
were we allowed to discuss other structures from the same group, if  
they weren't published in Nature.


Another reason is an understandable reluctance to make allegations  
in public, and the CCP4 bulletin board probably isn't the best  
place to do that.


But I think the case raises essential topics for the community to  
discuss, and this is a good forum for those discussions. We need to  
consider how to ensure the integrity of the structural databases  
and the associated publications.


So here are some questions to start a discussion, with some  
suggestions of partial answers.


1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is  
very small.


2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious  
mind, but it's extremely difficult to do well. No matter how well a  
structure is fabricated, it will violate something that is known  
now or learned later about the properties of real macromolecules  
and their diffraction data. If you're clever enough to do this  
really well, then you should be clever enough to determine the real  
structure of an interesting protein.


3. How can we tell whether structures in the PDB are fabricated, or  
just poorly refined?


The current standard validation tools are aimed at detecting errors  
in structure determination or the effects of poor refinement  
practice. None of them are aimed at detecting specific signs of  
fabrication because we assume (almost always correctly) that others  
are acting in good faith.


The more information that is available, the easier it will be to  
detect fabrication (because it is harder to make up more  
information convincingly). For instance, if the diffraction data  
are deposited, we can check for consistency with the known  
properties of real macromolecular crystals, e.g. that they contain  
disordered solvent and not vacuum. As Tassos Perrakis has  
discovered, there are characteristic ways in which the standard  
deviations depend on the intensities and the resolution. If  
unmerged data are deposited, there will probably be evidence of  
radiation damage, weak effects from intrinsic anomalous scatterers,  
etc. Raw images are probably even harder to simulate convincingly.


If a structure is fabricated by making up a new crystal form,  
perhaps a complex of previously-known components, then the crystal  
packing interactions should look like the interactions seen in real  
crystals. If it's fabricated by homology modelling, then the  
internal packing is likely to be suboptimal. I'm told by David  
Baker (who knows a thing or two about this) that it is extremely  
difficult to make a homology model that both obeys what we know  
about torsion angle preferences and is packed as well as a real  
protein structure.


I'm very interested in hearing about new ideas along these lines.  
The

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Anastassis Perrakis



On Aug 16, 2007, at 15:22, Randy J. Read wrote:


Raw images are probably even harder to simulate convincingly.


If i was to fabricate a structure, I would get first 'Fobs', then  
expand, then get the images
(I am sure one can hack 'strategy' or 'predict' or even 'mosflm' to  
tell you in which image every reflection is)
and then add noise in the images themselves. The process the images  
and go on from there ;-)


The thing that is certainly stopping me is that its much more  
difficult to do that, than solving the structure ...
but it would admittedly be quite some fun doing it right if one would  
ignore the tiny issue of the ethical side of such activity.


About archiving images, I have a feeling that the cost per Gb is the  
same as it was for structure factors in early 90's.


Last but not least, some EDS data mining we did here, agrees with  
Randy: very very few other structures, if any, appear to have
really strange statistics in the subset of the PDB with structure  
factors (aka EDS...). That is a relief.


As for the Nature debate, I am only disappointed and confused by one  
thing: Randy et al, ask for the images, like
one can ask for the dated logbook, in any other scientific  
discipline. For me that qualifies only two reactions from the group  
of Murthy:


1. Make the images available and demand a public apology for spoiling  
their name.

2. Shut up, retract the paper, buy property in Alaska and disappear.

The mumbo jumbo of the reply is so tragically irrelevant that I fail  
to understand how Nature tolerated it.


Tassos

PS the algorithm for the calculation of the sigmas (assuming they  
were calculated) does not look that naive actually.
Far from a simple linear relationship. They put some thought on it,  
but lets say that if you want to apply a 2D function

to simulate noise, don't do it along the principle axes ;-)

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Clemens Vonrhein

On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote:
 What do you count as raw data? Rawest are the images - everything  
 beyond that is modellling - but archiving images is _expensive_!  

Hmmm - not sure: let's say that a typical dataset requires about 180
images with 10Mb each image. With the current amount of roughly 4
X-ray structures in the PDB this is:

  4 * 180 * 10Mb = ~ 70 Tb of data

With simple 1TB external disk at about GBP 200 we get a price of GBP
14000, i.e. 35 pence per dataset.

Ok, this is not a proper calculation (more data collected, fine-phi
slicing, MAD datasets etc etc) and lets apply a 'safety factor' of 10:
but even then I think this is easily doable.

As Tassos remarked as well: if we could store/deposit and manage PDB
files in the 70s we should be able to do the same now (30 years
later!) with images ... easily.

Cheers

Clemens

 Unmerged intensities are probably more manageable
 
 Phil
 
 
 On  16 Aug 2007, at 15:05, Ashley Buckle wrote:
 
 Dear Randy
 
 These are very valid points, and I'm so glad you've taken the  
 important step of initiating this. For now I'd like to respond to  
 one of them, as it concerns something I and colleagues in Australia  
 are doing:
 
 The more information that is available, the easier it will be to  
 detect fabrication (because it is harder to make up more  
 information convincingly). For instance, if the diffraction data  
 are deposited, we can check for consistency with the known  
 properties of real macromolecular crystals, e.g. that they contain  
 disordered solvent and not vacuum. As Tassos Perrakis has  
 discovered, there are characteristic ways in which the standard  
 deviations depend on the intensities and the resolution. If  
 unmerged data are deposited, there will probably be evidence of  
 radiation damage, weak effects from intrinsic anomalous  
 scatterers, etc. Raw images are probably even harder to simulate  
 convincingly.
 
 After the recent Science retractions we realised that its about  
 time raw data was made available. So, we have set about creating  
 the necessary IT and software to do this for our diffraction data,  
 and are encouraging Australian colleagues to do the same. We are  
 about a week away from launching a web-accessible repository for  
 our recently published (eg deposited in PDB) data, and this should  
 coincide with an upcoming publication describing a new structure  
 from our labs. The aim is that publication occurs simultaneously  
 with release in PDB as well as raw diffraction data on our website.  
 We hope to house as much of our data as possible, as well as data  
 from other Australian labs, but obviously the potential dataset  
 will be huge, so we are trying to develop, and make available  
 freely to the community, software tools that allow others to easily  
 setup their own repositories.  After brief discussion with PDB the  
 plan is that PDB include links from coordinates/SF's to the raw  
 data using a simple handle that can be incorporated into a URL.  We  
 would hope that we can convince the journals that raw data must be  
 made available at the time of publication, in the same way as  
 coordinates and structure factors.  Of course, we realise that  
 there will be many hurdles along the way but we are convinced that  
 simply making the raw data available ASAP is a 'good thing'.
 
 We are happy to share more details of our IT plans with the CCP4BB,  
 such that they can be improved, and look forward to hearing feedback
 
 cheers
 

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Ashley Buckle

Validation aside, access to raw data is also helpful for method  
development (eg integration and scaling algorithms), on which we all  
rely.

Ashley

On 17/08/2007, at 1:04 AM, Santarsiero, Bernard D. wrote:

Sorry, I think it's a waste of resources to store the raw images. I  
think
we should trust people to be able to at least process their own  
data set.

Besides, you would need to include beamline parameters, beam position,
detector distances, etc. that may or may not be correct in the image
headers. I'm all for storage and retrieval of a primary intensity data
file (I or F^2 with esds).

Bernie Santarsiero


On Thu, August 16, 2007 9:46 am, Mischa Machius wrote:

Hmm - I think I miscalculated, by a factor of 100 even!... need more
coffee. In any case, I still think it would be doable. Best - MM


On Aug 16, 2007, at 9:30 AM, Mischa Machius wrote:


I don't think archiving images would be that expensive. For one, I
have found that most formats can be compressed quite substantially
using simple, standard procedures like bzip2. If optimized, raw
images won't take up that much space. Also, initially, only those
images that have been used to obtain phases and to refine finally
deposited structures could be archived. If the average structure
takes up 20GB of space, 5,000 structures would be 1TB, which fits
on a single hard drive for less than $400. If the community thinks
this is a worthwhile endeavor, money should be available from
granting agencies to establish a central repository (e.g., at the
RCSB). Imagine what could be done with as little as $50,000. For
large detectors, binning could be used, but giving current hard
drive prices and future developments, that won't be necessary. Best
- MM


On Aug 16, 2007, at 9:13 AM, Phil Evans wrote:


What do you count as raw data? Rawest are the images - everything
beyond that is modellling - but archiving images is _expensive_!
Unmerged intensities are probably more manageable

Phil


On  16 Aug 2007, at 15:05, Ashley Buckle wrote:


Dear Randy

These are very valid points, and I'm so glad you've taken the
important step of initiating this. For now I'd like to respond to
one of them, as it concerns something I and colleagues in
Australia are doing:


The more information that is available, the easier it will be to
detect fabrication (because it is harder to make up more
information convincingly). For instance, if the diffraction data
are deposited, we can check for consistency with the known
properties of real macromolecular crystals, e.g. that they
contain disordered solvent and not vacuum. As Tassos Perrakis
has discovered, there are characteristic ways in which the
standard deviations depend on the intensities and the
resolution. If unmerged data are deposited, there will probably
be evidence of radiation damage, weak effects from intrinsic
anomalous scatterers, etc. Raw images are probably even harder
to simulate convincingly.


After the recent Science retractions we realised that its about
time raw data was made available. So, we have set about creating
the necessary IT and software to do this for our diffraction
data, and are encouraging Australian colleagues to do the same.
We are about a week away from launching a web-accessible
repository for our recently published (eg deposited in PDB) data,
and this should coincide with an upcoming publication describing
a new structure from our labs. The aim is that publication occurs
simultaneously with release in PDB as well as raw diffraction
data on our website. We hope to house as much of our data as
possible, as well as data from other Australian labs, but
obviously the potential dataset will be huge, so we are trying to
develop, and make available freely to the community, software
tools that allow others to easily setup their own repositories.
After brief discussion with PDB the plan is that PDB include
links from coordinates/SF's to the raw data using a simple handle
that can be incorporated into a URL.  We would hope that we can
convince the journals that raw data must be made available at the
time of publication, in the same way as coordinates and structure
factors.  Of course, we realise that there will be many hurdles
along the way but we are convinced that simply making the raw
data available ASAP is a 'good thing'.

We are happy to share more details of our IT plans with the
CCP4BB, such that they can be improved, and look forward to
hearing feedback

cheers



 
--

--
Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353



- 
---


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Kay Diederichs

I'm glad that the discussion has finally set in, and would only like to 
comment on the practicability of storing images.


Mischa Machius schrieb:
I don't think archiving images would be that expensive. For one, I have 
found that most formats can be compressed quite substantially using 
simple, standard procedures like bzip2. If optimized, raw images won't 
take up that much space. Also, initially, only those images that have 
been used to obtain phases and to refine finally deposited structures 
could be archived. If the average structure takes up 20GB of space, 


that's on the high side I'd say; I would have estimated 1.5 GB (native 
alone) to 5 GB for e.g. a native and 3 wavelengths (after bzip2).



5,000 structures would be 1TB, which fits on a single hard drive for


5,000 structures of 20GB would be 100 TB

If the PDB would require all images of a _single_ dataset for 
molecular-replacement structures or mutant studies, and all images of 
all wavelengths/derivatives for experimentally phased structures, that 
would come to roughly (40,000 X-ray structures) * (on average 2 GB per 
structure) = 80 TB of data. At €250 per TB, that would be 20,000 € - an 
estimate of what it takes to store all the raw data for _all_ the X-ray 
structures in the PDB - less than what a single a single protein 
cloning/purification/crystallization/structure project costs per year.



less than $400. If the community thinks this is a worthwhile endeavor, 
money should be available from granting agencies to establish a central 
repository (e.g., at the RCSB). Imagine what could be done with as 
little as $50,000. For large detectors, binning could be used, but 
giving current hard drive prices and future developments, that won't be 
necessary. Best - MM




Archiving images is quite practical even for those data that do not 
directly correspond to deposited PDB entries.
In 1999 we abandoned tape storage of raw data in favor of disk storage. 
Everything we collected at synchrotrons since then still fits on two 
750GB disks. In 2000 we also needed two disks, and have been upgrading 
the disks when the old ones were full. To have these data online means 
that one can easily look at them again, for testing data reduction and 
phasing programs, and for trying to solve, using new programs, those 
structures where crystals could never be reproduced.


just my 2 cents -

Kay Diederichs
--
Kay Diederichs http://strucbio.biologie.uni-konstanz.de
email: [EMAIL PROTECTED] Tel +49 7531 88 4049 Fax 3183
Fachbereich Biologie, Universitaet Konstanz, Box M647, D-78457 Konstanz



smime.p7s
Description: S/MIME Cryptographic Signature

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Eleanor Dodson

This structure (1h6w) provides an interesting comparison; it looks just 
as I would expect though for such an interesting extended fold.
There are big peaks on the 3-fold axis; there is wispy density which 
would be very hard to model - I found an ILE in the wrong rotamer (341A) 
- (there is ALWAYS something you can improve) - in other words it looks 
like a real map..


And the intensity plots look as expected too..
Eleanor



Mark J. van Raaij wrote:

Dear all,

With regards to the possible fabrication of the 2hr0 structure, why 
would the authors have deposited the structure factors if this is not 
required by the journal? Also, why would they have fabricated a 
structure with gaps along c if they could have done so without the gap?


I few years ago, I had to cope with two structures with gaps along c, 
pdb codes 1h6w and 1ocy those of you who are interested, structure 
factors are available from the pdb, unmerged intensities/raw images I 
will look for and provide if requested...


Without further evidence, I suspect their structure is real, perhaps 
not optimally refined and treated though, but then again, this seems 
commonplace in Nature structures, perhaps due to lack of 
time/experience and, in some cases, putting too much pressure on the 
PhD students/postdocs involved instead of mentoring and checking them. 
I hope the authors provide the raw diffraction images to dispel any 
doubts and would be curious to learn about the other structures of the 
same group - anyone has a comprehensive, annotated list of them?


Greetings,

Mark J. van Raaij
Unidad de Bioquímica Estructural
Dpto de Bioquímica, Facultad de Farmacia
and
Unidad de Rayos X, Edificio CACTUS
Universidad de Santiago
15782 Santiago de Compostela
Spain
http://web.usc.es/~vanraaij/ http://web.usc.es/%7Evanraaij/


On 16 Aug 2007, at 15:22, Randy J. Read wrote:


On Aug 16 2007, Eleanor Dodson wrote:


The weighting in REFMAC is a function of SigmA ( plotted in log file).
For this example it will be nearly 1 for all resolutions ranges so 
the weights are pretty constant. There is also a contribution from 
the experimental sigma, which in this case seems to be 
proportional to |F|


Originally I expected that the publication of our Brief Communication 
in Nature would stimulate a lot of discussion on the bulletin board, 
but clearly it hasn't. One reason is probably that we couldn't be as 
forthright as we wished to be. For its own good reasons, Nature did 
not allow us to use the word fabricated. Nor were we allowed to 
discuss other structures from the same group, if they weren't 
published in Nature.


Another reason is an understandable reluctance to make allegations in 
public, and the CCP4 bulletin board probably isn't the best place to 
do that.


But I think the case raises essential topics for the community to 
discuss, and this is a good forum for those discussions. We need to 
consider how to ensure the integrity of the structural databases and 
the associated publications.


So here are some questions to start a discussion, with some 
suggestions of partial answers.


1. How many structures in the PDB are fabricated?

I don't know, but I think (or at least hope) that the number is very 
small.


2. How easy is it to fabricate a structure?

It's very easy, if no-one will be examining it with a suspicious 
mind, but it's extremely difficult to do well. No matter how well a 
structure is fabricated, it will violate something that is known now 
or learned later about the properties of real macromolecules and 
their diffraction data. If you're clever enough to do this really 
well, then you should be clever enough to determine the real 
structure of an interesting protein.


3. How can we tell whether structures in the PDB are fabricated, or 
just poorly refined?


The current standard validation tools are aimed at detecting errors 
in structure determination or the effects of poor refinement 
practice. None of them are aimed at detecting specific signs of 
fabrication because we assume (almost always correctly) that others 
are acting in good faith.


The more information that is available, the easier it will be to 
detect fabrication (because it is harder to make up more information 
convincingly). For instance, if the diffraction data are deposited, 
we can check for consistency with the known properties of real 
macromolecular crystals, e.g. that they contain disordered solvent 
and not vacuum. As Tassos Perrakis has discovered, there are 
characteristic ways in which the standard deviations depend on the 
intensities and the resolution. If unmerged data are deposited, there 
will probably be evidence of radiation damage, weak effects from 
intrinsic anomalous scatterers, etc. Raw images are probably even 
harder to simulate convincingly.


If a structure is fabricated by making up a new crystal form, perhaps 
a complex of previously-known components, then the crystal packing 
interactions should look like the interactions seen

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Jacob Keller

Hello All,

This debacle is actually quite reminiscent of a similar incident that Wayne 
Hendrickson caught in
the 1970's concerning purported tRNA crystals. Turned out to be completely 
fabricated, and the
guy's career went down the drain, I think. A good example to tell your trainees.

Jacob Keller

The ref's:


1.   True identity of a diffraction pattern attributed to valyl tRNA

WAYNE A. HENDRICKSON, BROR E. STRANDBERG, ANDERS LILJAS, L. MARIO AMZEL, EATON 
E. LATTMAN

CONTEXT: SIR - We have examined in detail several publications by H.H. 
Paradies. One is a report in 
Nature on 11 April 1970 about single crystals of a valine-specific tRNA from 
yeast1. We find that
the diffraction pattern attributed to valyl tRNA...

Nature 303, 195 - 195 (19 May 1983) Correspondence


2.   A reply from Paradies

H.H. PARADIES

Nature 303, 196 - 196 (19 May 1983) Correspondence

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Clemens Vonrhein

On Thu, Aug 16, 2007 at 03:13:29PM +0100, Phil Evans wrote:
 What do you count as raw data? Rawest are the images - everything  
 beyond that is modellling - but archiving images is _expensive_!  

Maybe we should contact Google to let them do it for us ;-)

  http://news.bbc.co.uk/2/hi/technology/6425975.stm

I doubt every crystallographer would want access to all raw datasets -
but for developers it would be ABSOLUTELY FANTASTIC (similar to things
like the JCSG archive). And just imagine all those well collected
datasets of  10 years ago and what we could learn from those (and the
better structures we could determine) with the modern tools and
programs ...

Clemens

-- 

***
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--
* BUSTER Development Group  (http://www.globalphasing.com)
***

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Richard Baxter

Dear All,

Without passing any judgement on the veracity of C3b structure 2hr0, I
note that the Ca RMSD of this structure with C3 structure 2a73 was
unusually low, compared to the RMSD of 2a73 to the related entries 2a74
and 2i07 by the same group, bovine C3 structure 2b39 and C3b and C3c
structures 2ice and 2icef. If one took a high resolution structure as a
molecular replacement solution of a new structure at lower resolution
this might be expected, but not vice versa?

As to whether the structures problem arise from malfeasance or neglect,
I do not understand why the journal did not require the raw images be
made available given the evidence presented against the published data,
isn't that what is done in other fields when such issues are raised?
Isn't it more practical to make the availability of raw data upon
request a requirement of publication more practical than trying to set
up a vast repository of images when submission to that repository is
still a matter of choice?

I have several questions regarding the reply that I would like to hear
an answer to, perhaps Todd can help obtain them:

1. Could the statement Statistical disorder resulting in apparent
'gaps' in the lattice has been observed for other proteins not be
referenced by citation to numerous deposited structures if they indeed
exist?

2. I was not convinced that the Z-scores of the PHASER solutions were
significant, shouldn't they be greater than 6.0? It didn't look like
density at 0.7 sigma was contiguous over the main chain.

3. Can the domain suggested to fill the void in the asymmetric unit be a
contaminant when it must be present in stoichiometric ratio in order
to provide lattice contacts? Why not present a SDS/PAGE gel of a
redissolved crystal, surely that domain would show up.

3. I don't understand why the statement Bulk-solvent modelling is
contentious, making many refinements necessary to constrain parameters
to obtain acceptable values was considered an acceptable response to
the question of the low resolution data. Whether one chooses to include
low-resolution data with bulk solvent modelling or to truncate the low
res data is a separate issue from the physical effect of solvent on
intensities at low resolution. 

One point in the reply that seemed reasonable is the issue of B-factor
variation, because the deposited C3 structures do exhibit a wide range
in the average B, also resolution, and whether TLS refinement was used
and how heavily restraints were set. However, that does not really
address the issue of seemingly random coil without other contacts having
such great contours at 2.5 sigma.

I would look forward to learning from people with more experience on
these matters.

sincerely,

Richard Baxter

On Thu, 2007-08-16 at 10:11, Green, Todd wrote:
 Hello all,
 
 I started to write a response to this thread yesterday. I thought the
 title was great even the content of Eleanor's email was very helpful.
 What I didn't like was the indictment in the next to last paragraph.
 This has been followed up with the word fabrication by others. No one
 knows definitively if this was fabricated. You have your suspicions,
 but you don't know. Fabrication suggests malicious wrong-doing. I
 actually don't think this was the case. I'm probably a bit biased
 because the work comes from an office down the hall from my own. I'd
 like to think that if the structure is wrong that it could be chalked
 up to inexperience rather than malice. To me, this scenario of
 inexperience seems like one that could become more and more prevalent
 as our field opens up to more and more scientists doing structural
 work who are not dedicated crystallographers.
 
 Having said that, I think Eleanor started an extremely useful thread
 as a way of avoiding the pitfalls of crystallography whether you are a
 novice or an expert. There's no question that this board is the best
 way to advance one's knowledge of crystallography. I actually gave a
 homework assignment that was simply to sign up for the ccp4bb.
 
 In reference to the previously mentioned work, I'd also like to hear
 discussion concurring or not the response letter some of which seems
 plausible to me.
 
 I hope I don't ruffle anyones feathers by my email, but I just thought
 that it should be said.
 
 Cheers-
 Todd
 
 
 -Original Message-
 From: CCP4 bulletin board on behalf of Randy J. Read
 Sent: Thu 8/16/2007 8:22 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] The importance of USING our validation tools
 
 On Aug 16 2007, Eleanor Dodson wrote:
 
 The weighting in REFMAC is a function of SigmA ( plotted in log
 file).
 For this example it will be nearly 1 for all resolutions ranges so
 the
 weights are pretty constant. There is also a contribution from the
 experimental sigma, which in this case seems to be proportional to
 |F|
 
 Originally I expected that the publication of our Brief Communication
 in
 Nature would stimulate a lot of discussion on the bulletin board, but
 clearly

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Bernhard Rupp

I like to emphasize that the infamous table 1 alone should
have immediately tipped off any competent reviewer.
The last shell I/Isig is 1.3 and rmerge 0.11 (!). Rfree and R
have extraorinarily low gaps. And all that for a 
large, porportedly flexible multidomain molecule.
Enough to ask more questions, even without initially having model, 
data, frames available.
 
Maybe the infamous Table 1 is still good for something after all.
Hiding it in supplemental material does not promote 
reading it.
 
br


From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Anastassis Perrakis
Sent: Thursday, August 16, 2007 8:13 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] The importance of USING our validation tools


1. Make the images available and demand a public apology for spoiling their
name.
2. Shut up, retract the paper, buy property in Alaska and disappear.

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread George M. Sheldrick

The deposited structure 2HR0 shows all the signs of having been refined, 
deliberately or accidentally, against 'calculated' data. The model used 
to 'calculate' the data had (almost) constant B-values in a rather empty 
cell containing no solvent. For example, it could have been a (partial?)
molecular replacement solution obtained using real data. It seems to me 
that it is perfectly possible that two reflection files (or two columns 
in an mtz file) were carelessly exchanged by a crystallographically 
inexperienced researcher. This even explains the low CA RMSD to the 2A73 
structure, if that had been used as a search fragment; even the 
suspiciously poor Phaser Z scores can be explained (maybe it was only a 
partially correct MR solution against the real data). So although my 
first reaction was that there was overwhelming evidence of fraud, on 
reflection a relatively benign explanation is still possible.

The situation could be clarified fairly quickly if the frames or a crystal 
or even the original HKL2000 .sca file could be found. What I really don't 
understand is how the Editors of the revered journal Nature allowed a 
'reply' to be printed which made no reference to the request for the 
essential experimental evidence, i.e. the raw diffraction data, to be 
produced. Protein crystallography is an experimental science just like any 
other, even if the results it produces usually stand the test of time 
better.

George  

Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-2582

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Dominika Borek

There are several issues under current discussion. We 
outline a few of these below, in order of importance.


The structure 2hr0 is unambiguously fake. Valid arguments 
have already been published in a Brief Communication by 
Janssen et. al (Nature, 448:E1-E2, 9 August 2007). However, 
the published response from the authors of the questioned 
deposit may sound to unfamiliar person as an issue of a 
scientific controversy. There are many additional 
independent signs of intentional data fabrication in this 
case, above and beyond those already mentioned.


One diagnostic is related to the fact that fabricating data 
will not show proper features of proteins with respect to 
disorder. The reported case has a very high ratio of 
“Fobs”/atom parameters, thus the phase uncertainty is small. 
In real structures fully solvent exposed chains without 
stabilizing interactions display intrinsically high 
disorder, yet in this structure these residues (e.g., 
Arg932B, Met1325B, Glu1138B, Arg459A, etc.) are impossibly 
well ordered.


The second set of diagnostics is the observation of perfect 
electron density around impossible geometries. For example, 
the electron density is perfect (visible even at the 4 sigma 
level in a 2Fo-Fc map) with no significant negative peaks in 
an Fo-Fc map around the guanidinium group of Arg1112B, which 
is in an outrageously close contact to carbon atoms of 
Lys1117B. This observation appears in many other places in 
the map as well. The issue is not the presence of bad 
contacts, but the lack of disorder (high B-factors) or 
negative peaks in an Fo-Fc map in this region that could 
explain why the bad contacts remain in the model.


The third set of diagnostics are statistics that do not 
occur in real structures. The ones mentioned previously are 
already very convincing (moments, B-factor plots, bulk 
solvent issues, etc.). We can add more evidence from a round 
of Refmac refinement of the deposited model versus the 
deposited structure factors. The anisotropic scaling factor 
obtained, which for a structure in a low symmetry space 
group such as C2 that has an inherent lack of constraint in 
packing symmetry, is unreasonable (particularly in view of 
the problems with lattice contacts already mentioned). The 
values from a Refmac refinement for a typical structure in 
space group C2 are: B11 =  0.72 B22 =  1.15 B33 = -2.12 B12 
=  0.00 B13 = -1.40 B23 =  0.00 (B12 and B23 are zero due to 
C2 space group symmetry). For structure 2hr0:  B11 = -0.02 
B22 =  0.00 B33 =  0.02 B12 =  0.00 B13 =  0.01 B23 =  0.00. 
Statistical reasoning can lead to P-values in the range of 
10exp(-6) for such values to be produced by chance in a real 
structure, but they are highly likely in a fabricated case.


The fourth set of diagnostics are significant 
inconsistencies in published methods, e.g. the authors claim 
that they collected data from four crystals, yet their data 
merging statistics show an R-merge = 0.11 in the last 
resolution shell. It is simply impossible to get such values 
particularly when I/sigma(I) for the last resolution shell 
was stated as 1.32. Moreover, the overall I/sigma(I) for all 
data is 5.36 and the overall R-merge is 0.07 – values highly 
inconsistent with the reported data resolution, quality of 
map and high data completeness (97.3%).


Overall this is just a short list of problems, the 
indicators of data fabrication/falsification are plentiful 
and if needed can be easily provided to interested parties.


We fully support Randy Read's excellent comments with our 
view of retraction and public discussion of this problem:


“Originally I expected that the publication of our Brief 
Communication in Nature would stimulate a lot of discussion 
on the bulletin board, but clearly it hasn't. One reason is 
probably that we couldn't be as forthright as we wished to 
be. For its own good reasons, Nature did not allow us to use 
the word fabricated. Nor were we allowed to discuss other 
structures from the same group, if they weren't published in 
Nature.”


One needs to address this policy with publishers in cases of 
intentional fraud that can be proven simply by an analysis 
of the published results. At this point the article needs to 
be retracted by Nature after Nature's internal investigation 
with input from crystallographic community rather then after 
obtaining results of any potential administrative 
investigation of fraud.


“Another reason is an understandable reluctance to make 
allegations in public, and the CCP4 bulletin board probably 
isn't the best place to do that.”


The discussion of fraud allegation was initiated by public 
reply to a question addressed to a single person, so it 
happened by chance rather than by intention, but with no 
complaint from our side.


On a different aspect of the discussion – namely, data 
preservation—currently, funding agencies as well as 
scientific responsibility requires authors of any 
publication to preserve and

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Dunten, Pete W.

A few thoughts following on Richard Baxter and George Sheldrick . . .

Re: gaps in the lattice  see the tyr-tRNA synthase structures (1tya for 
example).  Fersht has written a whole book full of insights from these 
structures.  

Re: Phaser Z scores.  For some MR work with two xtal forms of a structure, I 
got Z scores of 4.0 and 4.3 for the rotation and translation searches in one 
form, and 8.7 and 3.5 for the other, using a model with 18% sequence identity.  
So you don't need great Z scores for the solution to be right.  The map 
calculated with MR phases had a correlation coefficient of 0.22 with the final 
model.

Re: confusing columns in an mtz file.  I had the same thought.  If the column 
types were different for experimental versus calculated F's, and refmac only 
allowed you to refine against an experimental F, could this kind of trouble be 
avoided?  Of course you'd want an option to override the default, for people 
doing weird things.  Dunno about cns or phenix, but didn't we recently see 
messages about how hard it was to work with cns reflection files, leading to a 
new conversion program from Kevin?  It seems possible to get the wrong column 
there as well.

Re: images.  Be careful what you sign - the user agreements with synchrotron 
facilities in the USA may state that the data are public, and not private (as 
the funding is from the public). 

Pete

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Mischa Machius

Due to these recent, highly publicized irregularities and ample  
(snide) remarks I hear about them from non-crystallographers, I am  
wondering if the trust in macromolecular crystallography is beginning  
to erode. It is often very difficult even for experts to distinguish  
fake or wishful thinking from reality. Non-crystallographers will  
have no chance at all and will consequently not rely on our results  
as much as we are convinced they could and should. If that is indeed  
the case, something needs to be done, and rather sooner than later.   
Best - MM


 


Mischa Machius, PhD
Associate Professor
UT Southwestern Medical Center at Dallas
5323 Harry Hines Blvd.; ND10.214A
Dallas, TX 75390-8816; U.S.A.
Tel: +1 214 645 6381
Fax: +1 214 645 6353

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread Dima Klenchin


I like to emphasize that the infamous table 1 alone should
have immediately tipped off any competent reviewer.
The last shell I/Isig is 1.3 and rmerge 0.11 (!).


And keep in mind that this statistics comes from
merging data from FOUR different crystals! (That's
clearly and unambigously stated in Methods section).

Dima

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread William Scott

 No one knows definitively if this was fabricated.

Well, at least one person does.  

But I agree, it is important to keep in mind that the proper venue for 
determining guilt or innocence in the case of fraud is the court system.

Until fairly recently, the idea of presumed innocence and the right to 
cross-examine accusers and witnesses has been considered fundamental to 
civil society.

The case certainly sounds compelling, but this is all the more reason to 
adhere to these ideals.

Bill Scott

Re: [ccp4bb] The importance of USING our validation tools

2007-08-16 Thread William Scott

On Thu, 16 Aug 2007, Clemens Vonrhein wrote:
 
 Maybe we should contact Google to let them do it for us ;-)
 


Better yet, simply download your images to a computer that uses ATT as an 
internet service provider.  All the information will be automatically 
copied and stored by the NSA.

cf:  http://www.eff.org/legal/cases/att/faq.php


Bill

50 matches

Mail list logo