On 4/2/2012 6:03 AM, herman.schreu...@sanofi.com wrote:
If James Holton had been involved, the fabrication would not have been
discovered.
Herman

Uhh.  Thanks.  I think?

Apologies for remaining uncharacteristically quiet. I have been keeping up with the discussion, but not sure how much difference one more "vote" would make on the various issues. Especially since most of this has come up before. I agree that fraud is sick and wrong. I think backing up your data is a good idea, etc. etc. However, I seem to have been declared a leading "expert" on fake data, so I suppose I ought to say something about that. Not quite sure I want to volunteer to be the Defense Against The Dark Arts Teacher (they always seem to end badly). But, here goes:

I think the core of the "fraud problem" lies in our need for models, and I mean "models" in the general scientific sense not just PDB files. Fundamental to the practice of science is coming up with a "model" that explains the observations you made, preferably to within experimental error. One is also generally expected to estimate what the experimental error was. That is, if you plot a bunch of points on a graph, you need to fit some sort of curve to them, and that curve had better fit to "within the error bars", or you have some explaining to do. Protein structures are really nothing more than a ~50,000 parameter curve fit to ~50,000 data points. So, given that the technology for constructing "models" is widely available (be it gnuplot or refmac), as is the technology for estimating errors and generating random numbers, all the hard work a would-be fraud needs to make a plausible forgery has already been done. This is not something unique to crystallography! It is a general property of any mature science.

Indeed, "fake data", is not only a common tool in science but an inextricable part of it. Simulated diffraction images appear in the literature at least as early as Arndt and Wonacott (1976), and I'm sure even Moseley and Darwin (1913) made some "fake data" when trying to figure out all the sources of systematic error they were dealing with measuring reflected x-ray beams. At its heart, fake data is a "control". Remember "controls" from science class? They come in two flavors: positive and negative, and you are supposed to have both. In fact, all a fraud really is is someone who in some way, shape or form takes a positive control and calls it their "experiment". Pasting gel lanes together is an example of this. I think this is why fraud is so hard to prevent in science. You can't do science without controls, but anyone who has "access to the technology" for doing a control can also use it for evil. The labels are everything.

Personally, I classify fraud as an "intentionally incorrect" result. This separates it from "unintentionally incorrect" results (mistakes), which are far more common. Validation is meant to catch the "incorrect" part, but can never be expected to establish intent! In fact, I expect a mildly clever fraud might actually plan to hide behind the "we made a mistake in the deposition/figure/paper but now can't find the original data" defense. The case at hand (Zaborsky et al. 2010) may be a very good example of this. A new validation procedure (Rupp 2012) drew attention to the fabricated 3k78 structure as well as real structures where Fcalc was accidentally deposited instead Fobs (there are a number of these). Rupp's follow-up on 3k78 found troubling irregularities, but could it still be a mistake? If there is a combination of buttons in some GUI somewhere that "lets you" do this then I imagine at least one idiot may have "discovered" it. Perhaps even pleased with themselves for finding a "new way" to get their R factor down. The best evidence that Fobs simply does not exist for 3k78 was in the response (Zaborsky et al. 2012).

The same validation procedure also drew attention to other cases. Two of them 1n0r and 1n0q (Mosavi et al. 2002) were from my beamline (ALS 8.3.1), so finding the original images was simply a matter of flipping through the books of old DVDs I have in my office. They cost us $0.25 each in 2002. Yes, I do back up every image, primarily because figuring out which ones were "worth backing up" was actually a more expensive proposition. Even in adjusted dollars, I think the cost of the whole archive is still cheaper than what it would have cost Dan to re-grow his crystals and collect the data again in 2012. It is also nice to be able to say that the data for 1n0r were collected on Jan 30 2002 from 9:47 pm to 11:48 pm and 1n0q was collected on Mar 15 2002 from 12:52 pm until 3:48 pm. I was there! I saw the whole thing! Yes, I know, since I am "the guy who can fake images" I am not the best "witness" (the Defense Against the Dark Arts Teacher never is), but for whatever it is worth I DO recommend keeping your old images around. You never know when a forgotten slip of the mouse when using AutoDep ten years ago will come back to haunt you.

I think it very important to point out here that validation and peer review are not arbitrary gauntlets set up to prevent the unworthy from achieving the nirvana of "publication". What they are are services meant to help keep you from embarrassing yourself afterward. In the end, the responsibility for the veracity and validity of your paper lies with you, the author. Not the journal, not the reviewers, and definitely not the PDB. They are a repository, not a police force. Annotators will strongly encourage you to deal with validation issues, but they will, in the end, deposit whatever you give them. What they won't do is let you take it back! So before you make 10,000 copies of your paper and deposit your coordinates into the irrevocable memory of the PDB, it is a good idea to seek out the harshest critic you can find and listen to what they have to say. You don't have to DO everything they say, but listening is a good idea. Even a hard-working and diligent scientist who eats all his vegetables can still do something dumb, like put the protein and water on different origins just before deposition. Not that I would know anything about that (1rb1).

I also think it important to point out that it is not possible to build some kind of automated "fraud catcher", nor would it be advisable. It would only lull us into a false sense of security. Even branches of science that don't do a lot of curve-fitting (such as archaeology) still have "models" inasmuch as people have a picture in their heads of how they think all their data "should" fit together. All a fraud need do is create some artwork (be it a stone tool or a diffraction image) that is consistent with that picture, and no alarm bells will be raised. Perhaps not for years. Long enough to get a job anyway. And therein lies the incentive. Watching "The Apprentice" one might think that firing someone is easy, but its not. Anyone who has been in a management role long enough will tell you that giving someone a job is a lot easier than taking it away. Add to that the fact that the institution who hired the fraud is embarrassed about being so easily fooled, as is the institution that "trained" him/her. I imagine the funding agency who paid for the whole thing has some interesting PR to do as well. The sad truth of any fraud case is there are a lot of people who have a strong incentive to keep it as "quiet" as possible. Most of these people are not scientists. On the other hand, the damage done by the fraud is diluted over a very large number of people, most of whom are far away. They will blog on the internet about it, but few will take any real action. Was there ever an angry mob outside Hendrick Schon's house? Does anyone even know where he is now?

Now, before all you Tom Riddles out there start downloading my software, ordering a copy of "The Prince" on Amazon and picking a "structure" that will land you your Dream Job, let me tell you why this will not work. Are there secret catches in MLFSOM identifying the images it produces as "fake"? ... Maybe. But far far more important than any of that is the step that comes after fitting a curve that explains your "data" to within experimental error: making a prediction. Do you really think you are that smart? It is one thing to build a model that is consistent with all the biochemistry, mutagenesis, and homologous structures of a particular molecule, but can you predict all the future results other people will get? All of them? There is a reason why real scientists collect data. As one great man said: "... even the very wise cannot see all ends".

The problem with fraud as a career option is that you must either produce a "result" so insignificant and boring that nobody will ever check it or try to build upon it, or you must be very very lucky and actually fake something that turns out to be true. I suppose the latter vanity is the reasoning behind some of the more infamous frauds. In fact, I'm sure your average con artist might consider themselves very clever indeed to be able to fool all those smart scientist people. Such is the price we pay for the unparalleled level of trust that the worldwide scientific community has for one another. I mean, really, is there another group of people who so readily take the "word" of someone they have never met that they actually did do an experiment and are not just making stuff up? In a way, it is amazing we don't have more fraud in science. Why is that? Part of it is because fraud really does end your career. I'm sure HMK Murthy has a job now somewhere, but I doubt it has anything to do with science. Unless he changed his name. But most of all I think it is because our faith in the connection between truth and observation is not misplaced. Eventually, all scientific frauds will either be exposed or are simply inconsequential.

I think the biggest problem with fraud is not that having wrong results in the literature could lead us down the wrong path. There is no shortage of unintentionally incorrect crap out there already. I think the biggest problem is the breakdown of trust, which makes us behave in "unprofessional" ways. The combination of an ill-defined and virtually undetectable menace (intent) and a public outcry to "do something" is always a recipe for disaster. We do NOT want the "best strategy" for dealing with a mistake to be trying to protect yourself. I suppose as social animals we like to think we can trust and be trusted, but I think as a scientist one must always maintain a healthy and professional skepticism about any source of information. After all, the people who wrote the paper you are reading don't trust you that much either (otherwise they would have their images available on the web), and the molecules and equipment you work with definitely don't trust you. Not even a little bit.

-James Holton
MAD Scientist

Reply via email to