Speaking as a part-time methods developer, I agree with Tassos that a couple 
of hundred suitably chosen and documented datasets would be adequate for most 
purposes. I find that it is always revealing to be able to compare a new 
algorithm with existing attempts to solve the same problem, and this is much 
easier if we use the same data when reporting such tests. Since I am most 
interested in phasing, all I need are unmerged reflection datasets and a PDB 
file of the final model. It would be a relatively small extension of the
current deposition requirements to ask depositors to provide unmerged 
intensities and sigI for the data collected for phasing as well as for the 
final refinement. This would also provide useful additional information for 
validation (even where experimental phasing failed and the structure was
solved by MR).

George 

On Tue, Nov 01, 2011 at 12:50:42PM +0100, Anastassis Perrakis wrote:
> Dear all,
> 
> Apologies for a lengthy email in a lengthy chain of emails.
> 
> I think Jacob did here a good job refocusing the question. I will try to 
> answer
> it in a rather simplistic manner,
> but from the view point of somebody who might only have relatively little time
> in the field, but has enjoyed the 
> privilege of seeing it both from the developer and from the user perspective,
> as well as from environments
> as the synchrotron service-oriented sites, as well as from a cancer hospital. 
> I
> will only claim my weight=1 obviously,
> but I want to emphasize that where you stand influences your perspective.
> 
> let me first present the background that shapes my views.
> 
> <you can skip this>
> 
> When we started with ARP/wARP (two decades for Victor and getting pretty close
> for myself!), we (like others) hardly
> had the benefit of large datasets. We had some friends that gladly donated
> their data to us to play with,
> and we have assembled enough data to aid our primitive efforts back then. The
> same holds true for many.
> 
> At some point, around 2002, we started XtalDepot with Serge Cohen: the idea 
> was
> to systematically collect phased data,
> moving one step away from  HKL F/SigF to include either HLA/B/C/D or the 
> search
> model for the molecular replacement solution.
> Despite several calls, that archive only acquired around hundred structures,
> and yesterday morning was taken off-line 
> as it was not useful any more and was not visited by anyone any more. Very
> likely, our effort was redundant because of the JCSG
> dataset, which has been used by many and many people who are grateful for it 
> (I
> guess the 'almost every talk' of Frank refers to me, 
> I have never used the JCSG set).
> 
> Lately, I am involved to the PDB_REDO project, who was pioneered by Gert 
> Vriend
> and Robbie Joosten (who is now in my lab).
> Thanks to Gerard K. EDS clean-up and subsequent effort of both Robbie and 
> Garib
> who made gadzillions of fixes to refmac,
> now we can not only make maps of PDB entries, but also refine them - all but
> less than 100 structures. That has costed a significant part of 
> the last four-five years of Robbie's life (and has received limited
> appreciation from editors of 'important' journals and from referees of our
> grants).
> 
> </you can skip this>
> 
> These experiences are what shapes my view, and my train of thought goes like
> this:
> 
> The PDB collected F/sigF, and to be able to really use them to get maps first,
> to re-refine later, and re-build now, has received rather
> limited attention. It starts to have impact to some fields, mostly to modeling
> efforts and unlike referee nr.3 I strongly believe it
> has a great potential for impact.
> 
> My team collected also phases, so did JCSG in a more successful and consistent
> scale, 
> and that effort has been used indeed by developers to deliver better
> benchmarking
> of many software (to my knowledge it has escaped my attention if anyone used
> JCSG data directly for eg by learning techniques,
> but I apologize if I have missed that). This benchmarking of software, based 
> on
> 'real' maps for a rather limited set of data,
> hundreds and not tens of thousands, was important enough anyway.
> 
> That leads me to conclude that archiving images is a good idea on a voluntary
> basis. Somebody who needs it should convince the funding bodies
> to make the money available, and then take the effort to make the
> infrastructure available. I would predict then 100-200 datasets would be
> collected,
> and that would really really help developers to make these important new
> algorithms and software we all need. Thats a modest investment,
> that can teach us a lot. One of the SG groups can make this effort and most of
> us would support it, myself included.
> 
> Would such data help more than the developers? I doubt it. Is it important to
> make such a resource available to developers? Absolutely?
> What is the size of the resource needed? Limited to a few hundreds of 
> datasets,
> that can be curated and stored on a modest budget.
> 
> Talking about archiving in a PDB-scale might be fantastic in principle, but it
> would require time and resources to a scale that would not clearly stand the 
> cost-benefit trial, especially at times of austerity.
> 
> In contrast, a systematic effort of our community to deposit DNA in existing
> databanks like AddGene.com, and annotate PDB entries with such deposition
> numbers, would be cheap, efficient, and could have far-reaching implications
> for many people that could really get easily the DNA to start studying 
> structures in the database. That would surely lead to new science, because
> people interested enough in these structures to claim the DNA and 
> 'redo' the project would add new science. One can imagine even SG centers
> offering such a service 'please redo structure X for this and that reason',
> for a fee that would represent the real costs, that must be low given the
> investment already existing in experience and technology over there - a subset
> of targets could be on a 'request' basis...
> 
> Sorry for getting wild ... we can of course now have a referendum to decide in
> the best curse of action! :-(
> 
> A.
> 
> PS Rob, you are of course right about sequencing costs, but I was only trying
> to paint the bigger picture...
> 
> 
> 
> On Oct 31, 2011, at 18:00, Frank von Delft wrote:
> 
> 
>     "Loathe being forced to do things"?  You mean, like being forced to use
>     programs developed by others at no cost to yourself?
> 
>     I'm in a bit of a time-warp here - how exactly do users think our
>     current suite of software got to be as astonishingly good as it is?  10
>     years ago people (non-developers) were saying exactly the same things -
>     yet almost every talk on phasing and auto-building that I've heard ends
>     up acknowledging the JCSG datasets.
> 
>     Must have been a waste of time then, I suppose.
> 
>     phx.
> 
> 
> 
> 
>     On 31/10/2011 16:29, Adrian Goldman wrote:
> 
>         I have no problem with this idea as an opt-in. However I loathe being
>         forced to do things - for my own good or anyone else's. But unless I
>         read the tenor of this discussion completely wrongly, opt-in is
>         precisely what is not being proposed.
> 
> 
> 
>         Adrian Goldman
> 
> 
> 
>         Sent from my iPhone
> 
> 
> 
>         On 31 Oct 2011, at 18:02, Jacob Keller<j-kell...@fsm.northwestern.edu>
>          wrote:
> 
> 
> 
>             Dear Crystallographers,
> 
> 
> 
>             I am sending this to try to start a thread which addresses only 
> the
> 
>             specific issue of whether to archive, at least as a start, images
> 
>             corresponding to PDB-deposited structures. I believe there could 
> be
>             a
> 
>             real consensus about the low cost and usefulness of this degree of
> 
>             archiving, but the discussion keeps swinging around to all levels
>             of
> 
>             archiving, obfuscating who's for what and for what reason. What
>             about
> 
>             this level, alone? All of the accompanying info is already entered
> 
>             into the PDB, so there would be no additional costs on that score.
> 
>             There could just be a simple link, added to the "download files"
> 
>             pulldown, which could say "go to image archive," or something 
> along
> 
>             those lines. Images would be pre-zipped, maybe even tarred, and
>             people
> 
>             could just download from there. What's so bad?
> 
> 
> 
>             The benefits are that sometimes there are structures in which
> 
>             resolution cutoffs might be unreasonable, or perhaps there is some
> 
>             potential radiation damage in the later frames that might be
> 
>             deleterious to interpretations, or perhaps there are ugly features
>             in
> 
>             the images which are invisible or obscure in the statistics.
> 
> 
> 
>             In any case, it seems to me that this step would be pretty
>             painless,
> 
>             as it is merely an extension of the current system--just add a 
> link
>             to
> 
>             the pulldown menu!
> 
> 
> 
>             Best Regards,
> 
> 
> 
>             Jacob Keller
> 
> 
> 
>             --
> 
>             *******************************************
> 
>             Jacob Pearson Keller
> 
>             Northwestern University
> 
>             Medical Scientist Training Program
> 
>             email: j-kell...@northwestern.edu
> 
>             *******************************************
> 
> 
> 
> 
> P please don't print this e-mail unless you really need to
> Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
> Department of Biochemistry (B8)
> Netherlands Cancer Institute, 
> Dept. B8, 1066 CX Amsterdam, The Netherlands
> Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791
> 
> 
> 
> 

-- 
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry, 
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582

Reply via email to