Dear all,

Apologies for a lengthy email in a lengthy chain of emails.

I think Jacob did here a good job refocusing the question. I will try to answer it in a rather simplistic manner, but from the view point of somebody who might only have relatively little time in the field, but has enjoyed the privilege of seeing it both from the developer and from the user perspective, as well as from environments as the synchrotron service-oriented sites, as well as from a cancer hospital. I will only claim my weight=1 obviously, but I want to emphasize that where you stand influences your perspective.

let me first present the background that shapes my views.

<you can skip this>

When we started with ARP/wARP (two decades for Victor and getting pretty close for myself!), we (like others) hardly had the benefit of large datasets. We had some friends that gladly donated their data to us to play with, and we have assembled enough data to aid our primitive efforts back then. The same holds true for many.

At some point, around 2002, we started XtalDepot with Serge Cohen: the idea was to systematically collect phased data, moving one step away from HKL F/SigF to include either HLA/B/C/D or the search model for the molecular replacement solution. Despite several calls, that archive only acquired around hundred structures, and yesterday morning was taken off-line as it was not useful any more and was not visited by anyone any more. Very likely, our effort was redundant because of the JCSG dataset, which has been used by many and many people who are grateful for it (I guess the 'almost every talk' of Frank refers to me,
I have never used the JCSG set).

Lately, I am involved to the PDB_REDO project, who was pioneered by Gert Vriend and Robbie Joosten (who is now in my lab). Thanks to Gerard K. EDS clean-up and subsequent effort of both Robbie and Garib who made gadzillions of fixes to refmac, now we can not only make maps of PDB entries, but also refine them - all but less than 100 structures. That has costed a significant part of the last four-five years of Robbie's life (and has received limited appreciation from editors of 'important' journals and from referees of our grants).

</you can skip this>

These experiences are what shapes my view, and my train of thought goes like this:

The PDB collected F/sigF, and to be able to really use them to get maps first, to re-refine later, and re-build now, has received rather limited attention. It starts to have impact to some fields, mostly to modeling efforts and unlike referee nr.3 I strongly believe it
has a great potential for impact.

My team collected also phases, so did JCSG in a more successful and consistent scale, and that effort has been used indeed by developers to deliver better benchmarking of many software (to my knowledge it has escaped my attention if anyone used JCSG data directly for eg by learning techniques, but I apologize if I have missed that). This benchmarking of software, based on 'real' maps for a rather limited set of data,
hundreds and not tens of thousands, was important enough anyway.

That leads me to conclude that archiving images is a good idea on a voluntary basis. Somebody who needs it should convince the funding bodies to make the money available, and then take the effort to make the infrastructure available. I would predict then 100-200 datasets would be collected, and that would really really help developers to make these important new algorithms and software we all need. Thats a modest investment, that can teach us a lot. One of the SG groups can make this effort and most of us would support it, myself included.

Would such data help more than the developers? I doubt it. Is it important to make such a resource available to developers? Absolutely? What is the size of the resource needed? Limited to a few hundreds of datasets, that can be curated and stored on a modest budget.

Talking about archiving in a PDB-scale might be fantastic in principle, but it would require time and resources to a scale that would not clearly stand the
cost-benefit trial, especially at times of austerity.

In contrast, a systematic effort of our community to deposit DNA in existing databanks like AddGene.com, and annotate PDB entries with such deposition numbers, would be cheap, efficient, and could have far-reaching implications for many people that could really get easily the DNA to start studying structures in the database. That would surely lead to new science, because people interested enough in these structures to claim the DNA and 'redo' the project would add new science. One can imagine even SG centers offering such a service 'please redo structure X for this and that reason', for a fee that would represent the real costs, that must be low given the investment already existing in experience and technology over there - a subset
of targets could be on a 'request' basis...

Sorry for getting wild ... we can of course now have a referendum to decide in the best curse of action! :-(

A.

PS Rob, you are of course right about sequencing costs, but I was only trying to paint the bigger picture...



On Oct 31, 2011, at 18:00, Frank von Delft wrote:

"Loathe being forced to do things"? You mean, like being forced to use
programs developed by others at no cost to yourself?

I'm in a bit of a time-warp here - how exactly do users think our
current suite of software got to be as astonishingly good as it is? 10 years ago people (non-developers) were saying exactly the same things - yet almost every talk on phasing and auto-building that I've heard ends
up acknowledging the JCSG datasets.

Must have been a waste of time then, I suppose.

phx.




On 31/10/2011 16:29, Adrian Goldman wrote:
I have no problem with this idea as an opt-in. However I loathe being forced to do things - for my own good or anyone else's. But unless I read the tenor of this discussion completely wrongly, opt- in is precisely what is not being proposed.

Adrian Goldman

Sent from my iPhone

On 31 Oct 2011, at 18:02, Jacob Keller<j- kell...@fsm.northwestern.edu> wrote:

Dear Crystallographers,

I am sending this to try to start a thread which addresses only the
specific issue of whether to archive, at least as a start, images
corresponding to PDB-deposited structures. I believe there could be a
real consensus about the low cost and usefulness of this degree of
archiving, but the discussion keeps swinging around to all levels of
archiving, obfuscating who's for what and for what reason. What about
this level, alone? All of the accompanying info is already entered
into the PDB, so there would be no additional costs on that score.
There could just be a simple link, added to the "download files"
pulldown, which could say "go to image archive," or something along
those lines. Images would be pre-zipped, maybe even tarred, and people
could just download from there. What's so bad?

The benefits are that sometimes there are structures in which
resolution cutoffs might be unreasonable, or perhaps there is some
potential radiation damage in the later frames that might be
deleterious to interpretations, or perhaps there are ugly features in
the images which are invisible or obscure in the statistics.

In any case, it seems to me that this step would be pretty painless,
as it is merely an extension of the current system--just add a link to
the pulldown menu!

Best Regards,

Jacob Keller

--
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
*******************************************


P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute,
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791




Reply via email to