On 4/2/2012 6:03 AM, herman.schreu...@sanofi.com wrote:
If James Holton had been involved, the fabrication would not have been
discovered.
Herman
Uhh. Thanks. I think?
Apologies for remaining uncharacteristically quiet. I have been keeping
up with the discussion, but not sure how much difference one more "vote"
would make on the various issues. Especially since most of this has
come up before. I agree that fraud is sick and wrong. I think backing
up your data is a good idea, etc. etc. However, I seem to have been
declared a leading "expert" on fake data, so I suppose I ought to say
something about that. Not quite sure I want to volunteer to be the
Defense Against The Dark Arts Teacher (they always seem to end badly).
But, here goes:
I think the core of the "fraud problem" lies in our need for models, and
I mean "models" in the general scientific sense not just PDB files.
Fundamental to the practice of science is coming up with a "model" that
explains the observations you made, preferably to within experimental
error. One is also generally expected to estimate what the experimental
error was. That is, if you plot a bunch of points on a graph, you need
to fit some sort of curve to them, and that curve had better fit to
"within the error bars", or you have some explaining to do. Protein
structures are really nothing more than a ~50,000 parameter curve fit to
~50,000 data points. So, given that the technology for constructing
"models" is widely available (be it gnuplot or refmac), as is the
technology for estimating errors and generating random numbers, all the
hard work a would-be fraud needs to make a plausible forgery has already
been done. This is not something unique to crystallography! It is a
general property of any mature science.
Indeed, "fake data", is not only a common tool in science but an
inextricable part of it. Simulated diffraction images appear in the
literature at least as early as Arndt and Wonacott (1976), and I'm sure
even Moseley and Darwin (1913) made some "fake data" when trying to
figure out all the sources of systematic error they were dealing with
measuring reflected x-ray beams. At its heart, fake data is a
"control". Remember "controls" from science class? They come in two
flavors: positive and negative, and you are supposed to have both. In
fact, all a fraud really is is someone who in some way, shape or form
takes a positive control and calls it their "experiment". Pasting gel
lanes together is an example of this. I think this is why fraud is so
hard to prevent in science. You can't do science without controls, but
anyone who has "access to the technology" for doing a control can also
use it for evil. The labels are everything.
Personally, I classify fraud as an "intentionally incorrect" result.
This separates it from "unintentionally incorrect" results (mistakes),
which are far more common. Validation is meant to catch the "incorrect"
part, but can never be expected to establish intent! In fact, I expect
a mildly clever fraud might actually plan to hide behind the "we made a
mistake in the deposition/figure/paper but now can't find the original
data" defense. The case at hand (Zaborsky et al. 2010) may be a very
good example of this. A new validation procedure (Rupp 2012) drew
attention to the fabricated 3k78 structure as well as real structures
where Fcalc was accidentally deposited instead Fobs (there are a number
of these). Rupp's follow-up on 3k78 found troubling irregularities, but
could it still be a mistake? If there is a combination of buttons in
some GUI somewhere that "lets you" do this then I imagine at least one
idiot may have "discovered" it. Perhaps even pleased with themselves
for finding a "new way" to get their R factor down. The best evidence
that Fobs simply does not exist for 3k78 was in the response (Zaborsky
et al. 2012).
The same validation procedure also drew attention to other cases. Two
of them 1n0r and 1n0q (Mosavi et al. 2002) were from my beamline (ALS
8.3.1), so finding the original images was simply a matter of flipping
through the books of old DVDs I have in my office. They cost us $0.25
each in 2002. Yes, I do back up every image, primarily because figuring
out which ones were "worth backing up" was actually a more expensive
proposition. Even in adjusted dollars, I think the cost of the whole
archive is still cheaper than what it would have cost Dan to re-grow his
crystals and collect the data again in 2012. It is also nice to be able
to say that the data for 1n0r were collected on Jan 30 2002 from 9:47 pm
to 11:48 pm and 1n0q was collected on Mar 15 2002 from 12:52 pm until
3:48 pm. I was there! I saw the whole thing! Yes, I know, since I am
"the guy who can fake images" I am not the best "witness" (the Defense
Against the Dark Arts Teacher never is), but for whatever it is worth I
DO recommend keeping your old images around. You never know when a
forgotten slip of the mouse when using AutoDep ten years ago will come
back to haunt you.
I think it very important to point out here that validation and
peer review are not arbitrary gauntlets set up to prevent the unworthy
from achieving the nirvana of "publication". What they are are services
meant to help keep you from embarrassing yourself afterward. In the
end, the responsibility for the veracity and validity of your paper lies
with you, the author. Not the journal, not the reviewers, and
definitely not the PDB. They are a repository, not a police force.
Annotators will strongly encourage you to deal with validation issues,
but they will, in the end, deposit whatever you give them. What they
won't do is let you take it back! So before you make 10,000 copies of
your paper and deposit your coordinates into the irrevocable memory of
the PDB, it is a good idea to seek out the harshest critic you can find
and listen to what they have to say. You don't have to DO everything
they say, but listening is a good idea. Even a hard-working and
diligent scientist who eats all his vegetables can still do something
dumb, like put the protein and water on different origins just before
deposition. Not that I would know anything about that (1rb1).
I also think it important to point out that it is not possible to
build some kind of automated "fraud catcher", nor would it be
advisable. It would only lull us into a false sense of security. Even
branches of science that don't do a lot of curve-fitting (such as
archaeology) still have "models" inasmuch as people have a picture in
their heads of how they think all their data "should" fit together. All
a fraud need do is create some artwork (be it a stone tool or a
diffraction image) that is consistent with that picture, and no alarm
bells will be raised. Perhaps not for years. Long enough to get a job
anyway. And therein lies the incentive. Watching "The Apprentice" one
might think that firing someone is easy, but its not. Anyone who has
been in a management role long enough will tell you that giving someone
a job is a lot easier than taking it away. Add to that the fact that
the institution who hired the fraud is embarrassed about being so easily
fooled, as is the institution that "trained" him/her. I imagine the
funding agency who paid for the whole thing has some interesting PR to
do as well. The sad truth of any fraud case is there are a lot of
people who have a strong incentive to keep it as "quiet" as possible.
Most of these people are not scientists. On the other hand, the damage
done by the fraud is diluted over a very large number of people, most of
whom are far away. They will blog on the internet about it, but few
will take any real action. Was there ever an angry mob outside Hendrick
Schon's house? Does anyone even know where he is now?
Now, before all you Tom Riddles out there start downloading my software,
ordering a copy of "The Prince" on Amazon and picking a "structure" that
will land you your Dream Job, let me tell you why this will not work.
Are there secret catches in MLFSOM identifying the images it produces as
"fake"? ... Maybe. But far far more important than any of that is the
step that comes after fitting a curve that explains your "data" to
within experimental error: making a prediction. Do you really think you
are that smart? It is one thing to build a model that is consistent
with all the biochemistry, mutagenesis, and homologous structures of a
particular molecule, but can you predict all the future results other
people will get? All of them? There is a reason why real scientists
collect data. As one great man said: "... even the very wise cannot see
all ends".
The problem with fraud as a career option is that you must either
produce a "result" so insignificant and boring that nobody will ever
check it or try to build upon it, or you must be very very lucky and
actually fake something that turns out to be true. I suppose the latter
vanity is the reasoning behind some of the more infamous frauds. In
fact, I'm sure your average con artist might consider themselves very
clever indeed to be able to fool all those smart scientist people. Such
is the price we pay for the unparalleled level of trust that the
worldwide scientific community has for one another. I mean, really, is
there another group of people who so readily take the "word" of someone
they have never met that they actually did do an experiment and are not
just making stuff up? In a way, it is amazing we don't have more fraud
in science. Why is that? Part of it is because fraud really does end
your career. I'm sure HMK Murthy has a job now somewhere, but I doubt
it has anything to do with science. Unless he changed his name. But
most of all I think it is because our faith in the connection between
truth and observation is not misplaced. Eventually, all scientific
frauds will either be exposed or are simply inconsequential.
I think the biggest problem with fraud is not that having wrong
results in the literature could lead us down the wrong path. There is
no shortage of unintentionally incorrect crap out there already. I
think the biggest problem is the breakdown of trust, which makes us
behave in "unprofessional" ways. The combination of an ill-defined and
virtually undetectable menace (intent) and a public outcry to "do
something" is always a recipe for disaster. We do NOT want the "best
strategy" for dealing with a mistake to be trying to protect yourself.
I suppose as social animals we like to think we can trust and be
trusted, but I think as a scientist one must always maintain a healthy
and professional skepticism about any source of information. After all,
the people who wrote the paper you are reading don't trust you that much
either (otherwise they would have their images available on the web),
and the molecules and equipment you work with definitely don't trust
you. Not even a little bit.
-James Holton
MAD Scientist