Sorry for my boring response………

‘Short’ bit:
Has anyone here considered DOI’s onto data? Facility sites within Europe and 
planning to make this available, I hope to do a proof of principle this year on 
data from Diamond (volunteers?). But as an example the ISIS neutron site on the 
same campus as us have started to do this, as a random example you can go to 
http://doi.org  and put in the DOI reference 10.5286/ISIS.E.24079772 (catchy), 
but this takes you to a landing page where you can see some details of the data 
and an actual citable (I think) reference to the data for a publication. There 
is a link to the data but the data has not yet been made public by the author 
or facility, but at least its (should be) there and will eventually be public. 
The responsibility is now on the facility for looking after  and making the 
data available.

This wouldn’t suit everyone, and also there is the issue of home sources, but 
tools are under development to make this easy. I could easily imagine that 
within the UK STFC would probably host something like this for non facility 
data (it is actually them who host Diamond data for us)…. Maybe at a nominal 
cost of course….

Long bit:
Something similar at Diamond, /dls/$Beamline_name/data/$Year/$proposal-$visit 
and permissions are set accordingly so only the people on the visit or the PI’s 
of the proposal can see the data therein. What happens within that directory is 
still pretty much the users choice at the moment. Though once the data is 
collected its read only and its all recorded in ISPyB (beamline database with 
web pages developed at ESRF and Diamond). You can also record details of the 
sample and link the data collections to it.

There is an EU funded initiative that I have make the IUCr DDDwg aware of in 
Europe called PanData (http://www.pan-data.eu/) which includes most of Europe’s 
X-ray and neutron sites. Under this initiative the facilities are attempting to 
standardise on authorisation, data formats, some software, access policies 
(making data public) data retention and cataloguing.

Here we’ve been a bit lucky to get ahead on this and we have been able to keep 
a copy of all our data off all beamlines, raw and processed on tape (that’s 
just under 200Tb and 53 million catalogued files so far, lots of data including 
processed data its not yet catalogued but is on tape). We are currently beta 
testing a web page to the data that is catalogued, so anyone who has collected 
data at diamond should be able to get it from https://icat.diamond.ac.uk. The 
data will probably be coming off tape so can take a while, also it’s a little 
bit clumsy as an interface but it will get better. This is the same technology 
as is being proposed for PanData facilities, but the backend of the actual data 
archive is the choice of each facility, ours is hosted in a tape robot by STFC 
at the moment.

This is by no means the only solution out there but DOI’s could help unify the 
solutions?

Alun
___________________________________________________________
Alun Ashton, alun.ash...@diamond.ac.uk Tel: +44 1235 778404
Scientific Software Team Leader,  http://www.diamond.ac.uk/
Diamond Light Source, Chilton, Didcot, Oxon, OX11 0DE, U.K.
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tom Peat
Sent: 18 October 2011 23:29
To: ccp4bb
Subject: Re: [ccp4bb] IUCr committees, depositing images

If we are talking schemes, here is another one that we use that might be 
considered:

Date/person/project/barcode/well#/crystal#

At the Australian synchrotron, a directory is automatically made with the date, 
so that is our starting point.
We sometimes skip the person, but project-barcode-well are always there, as 
then it can correspond to our crystal database.
I imagine that most high throughput centres use barcodes, so barcodes and well 
numbers would be good things to have in the path.

Cheers, tom

From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
mjvdwo...@netscape.net
Sent: Wednesday, 19 October 2011 6:03 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] IUCr committees, depositing images

Phoebe,

Just automate the archiving and come up with a reasonable scheme how to. Ours 
is that data sets are called:

userid_yearmonth_projectid_#

Userid is derived from the login into CrystalClear (oops, free advertizing), 
projectid is set by the PI (so she can remember 10 years from now what in the 
world these data are all about) and the users are asked (threatened) to call 
their data sets "projectid_#" (and not the ubiquitous "test"). We have a script 
that automatically archives everything away from our data collection computer 
into an archive - activated by an icon on the desktop - and it adds the userid 
and date to the filename. This has the nice added advantage that the data 
collection disk stays clean. This only breaks when we collect synchrotron data 
(which is all the time) because our synchrotron remote scientist who collects 
the data cannot (should not) be threatened. :-) I then rename all data sets for 
archiving so the naming is consistent and you can actually make (say in pdf) an 
index of all the data you have, organized by user, date, or project.

Our policy is that the PI decides if data should be maintained or if it really 
can go (no diffraction, really a test crystal to see that the crystal is in the 
beam etc). In practice this doesn't happen so someone else makes the decision. 
We tend to err on the side of caution. We tend to think that all results should 
be saved, unless it is blatantly obvious that there is no point. Storage is 
cheap (and cheaper every time you think of it).

After you automate in the previously agreed upon scheme, it is somewhat easier 
to find things back because if you can remember who collected it, or 
approximately when it was done, or what the project was, you can find it. The 
pain was up front: to come up with a scheme, to enable a rigorous naming 
convention and to implement it (data collection computer and archive are not 
physically on the same computer etc).

Maybe the Committee is also thinking about that issue - how are you going to 
keep all the data manageable and searchable. Presumably by something like a PDB 
id (this seems to make sense for published/deposited structures) but for 
"things that did not make it to PDB" one would have to come up with another 
plan.

Mark


-----Original Message-----
From: Phoebe Rice <pr...@uchicago.edu>
To: CCP4BB <CCP4BB@JISCMAIL.AC.UK>
Sent: Tue, Oct 18, 2011 12:01 pm
Subject: Re: [ccp4bb] IUCr committees, depositing images

One more consideration:

Since organization is not one of my greatest talents, I would be absolutely

delighted if a databank took over the burden of archiving my raw data for me.

  Phoebe



=====================================

Phoebe A. Rice

Dept. of Biochemistry & Molecular Biology

The University of Chicago

phone 773 834 1723

http://bmb.bsd.uchicago.edu/Faculty_and_Research/01_Faculty/01_Faculty_Alphabetically.php?faculty_id=123

http://www.rsc.org/shop/books/2008/9780854042722.asp





---- Original message ----

>Date: Tue, 18 Oct 2011 18:17:14 +0100

>From: CCP4 bulletin board 
><CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>> (on behalf of Gerard 
>Bricogne

<g...@globalphasing.com<mailto:g...@globalphasing.com>>)

>Subject: Re: [ccp4bb] IUCr committees, depositing images

>To: CCP4BB@JISCMAIL.AC.UK<mailto:CCP4BB@JISCMAIL.AC.UK>

>

>Dear Enrico, Frank and colleagues,

>

>     I am glad to have suggested that everyone's views on this issue should

>be aired out on this BB rather than sent off-list to an IUCr committee

>member: this is much more interactive and thought-provoking.

>

>     There would seem to be clear biases in some of the positions - for

>instance, the statement that we overvalue individual structures and that

>there is value only in their ensemble has to be seen to be coming from

>someone in a structural genomics centre ;-) . However, as Wladek pointed

>out, when an investigator's project is crucially dependent on a result

>embodied in a deposited structure, it would be of the greatest value to that

>investigator to be able to double-check how reliable some features of that

>structure (especially its ligands) actually are.

>

>     On the other hand Enrico, as a specialist of crystallisation and

>modelling, sees value only in improving those contributors to the task of

>structure determination. This is forgetting (1) an essential capability of

>crystallography: that, through experimental phasing, it can show you what a

>protein looks like even if you have never seen nor modelled one before,

>through the wondrous process of producing model-free electron-density maps;

>and (2) an essential aspect of the task of structure determination: that it

>doesn't aim at producing a model with perfect geometry, but one that best

>explains the measured data and neither under- nor over-interprets them (I

>realise, though, that Enrico's statement "Data just introduces experimental

>errors into what would otherwise be a perfect structure" is likely to be

>tongue-in-cheek ...).

>

>     When it comes to making explicit the advantages of archiving at least

>the raw images that yielded the data against which a deposited PDB entry was

>refined, many good reasons have been given, but I feel that

>

>     (1) there is an over-emphasis on the preservation of diffuse scattering

>that has a tendency to give this archiving a nuance of "blue-skies" research

>and thus to detract from its practical urgency; time will come for diffuse

>scattering to be fully appreciated, but at the moment its mention acts as a

>bit of a distraction, if not a turn-off in this context for people who not

>not love it already;

>

>     (2) as far as I see it, the highest future benefit of having archived

>raw images will result from being able to reprocess datasets from samples

>containing multiple lattices ("non-merohedral twinning"). Numerous

>structures are determined and refined against data obtained by integrating

>only the spots from the major lattice, without rejecting those that are

>corrupted by overlap by a spot from a minor lattice. This leads to

>systematic errors in these data that may only be incompletely taken out by

>outlier rejection at the merging stage, and will create noise or confusing

>residual features in difference maps, if not false features in the main map

>and therefore its interpretation by the model. In my opinion it will be the

>development of methods for dealing with overlapped lattices and for the

>proper treatment of such data in scaling and refinement (as is already

>possible with small molecules) that will bring about the major possibility

>of substantially improving deposited results by reprocessing the raw images

>co-deposited with them;

>

>     (3) there is also the more immediate possibility of better removing ice

>rings, or ligand powder rings, from images, than by having to throw away

>certain thin shells of merged data in the structure factor file.

>

>     I see the case for raw image deposition as absolutely compelling,

>especially in view of the auto-catalytic process through which their

>availability will speed up the development of precisely the new methods and

>software to extract better data from them and better refine models against

>them. The impact of structure factor deposition on the development of better

>refinement programs is there to prove that this paradigm of a chain reaction

>makes total sense.

>

>     Various arguments tend to be fired off as decoys - "get better

>crystals", why not "get a better post-doc"? - but they are unhelpful in the

>way they prolong procrastination when what we need is to bite the bullet.

>The IUCr Forum that John Helliwell pointed at already contains draft plans

>for a pilot run of a reasonable scheme.

>

>

>     With best wishes,

>

>          Gerard.

>

>--

>On Tue, Oct 18, 2011 at 06:19:27PM +0200, Enrico Stura wrote:

>> Dear Peter,

>>

>> How many crystallographers does it take to transform bad data into good

>> data?

>> None, you need a modeller. Only a modeller can give you a structure with

>> perfect

>> geometry. Data just introduces experimental errors into what would

>> otherwise be a perfect

>> structure.

>>

>> If you have good data do you need crystallographers?

>> ...

>>

>> Of course there all the cases in between. That ... you are right, is the

>> other half of the story.

>>

>> From a biological point of view, only borderline cases make "cents" ($+€)

>> to store.

>> The experimenter in consultation with a beamline scientist at an SR

>> facility is the best

>> small commitee suitable to evaluate what is worth keeping. I am sure that

>> the images

>> that are worth storing for a long long time would fit on a few Tb at a

>> reasonable cost.

>> Storing everything would make it harder to find something worth improving

>> in the future.

>>

>> Enrico.

>>

>>

>> On Tue, 18 Oct 2011 17:12:42 +0200, Peter Keller

>> <pkel...@globalphasing.com<mailto:pkel...@globalphasing.com>> wrote:

>>

>>> Dear Enrico,

>>>

>>> Please don't get me wrong: what you are saying is not incorrect, but it

>>> is only half the story.

>>>

>>> On Tue, 2011-10-18 at 15:13 +0200, Enrico Stura wrote:

>>>> With improving techniques, we should always be making progress!

>>>

>>> Yes, of course!

>>>

>>>> If we are trying to answer a biological question that is really

>>>> important,

>>>> we would be better off

>>>> improving the purification, the crystallization, the cryo-conditions

>>>

>>> You have left X-ray crystallography out of this list. It is a technique

>>> like the others, and can also be improved :-)

>>>

>>> It may be true that the number of crystallographers that are working on

>>> improving instrumental methodology and software is small compared to the

>>> number working on improving wet-lab techniques, but that number is not

>>> zero, and the contribution is significant. The rest of you benefit from

>>> that work!

>>>

>>>> instead of having to rely on

>>>> processing old images with new software.

>>>>

>>>> I have 10 years  worth of images. I have reprocessed very few of them and

>>>> never made any

>>>> sensational progress using the new software. Poor diffraction is poor

>>>> diffraction.

>>>

>>> Maybe so, but certain types of datasets are useful for methods and

>>> software development, even if no new biological insights could be gained

>>> by reprocessing them. These datasets are often hard to get hold of in

>>> practice, especially when they are in someone's lab on a tape that

>>> no-one has a reader for any more.

>>>

>>> Obtaining protein, growing crystals and collecting new data in such a

>>> way that the interesting features of those datasets are reproduced can

>>> be much much harder than curating the images would be. This is

>>> especially true for software-oriented people like us who don't have

>>> regular access to wet-lab facilities.

>>>

>>>> Money can be better spent buying a wine cellar, storage works for wine.

>>>

>>> Images have already been lost that ought to have been kept. The

>>> questions are: how to select the datasets that are potentially of value,

>>> and how to make sure that they don't disappear.

>>>

>>> Regards,

>>> Peter.

>>>

>>

>>

>> --

>> Enrico A. Stura D.Phil. (Oxon) ,    Tel: 33 (0)1 69 08 4302 Office

>> Room 19, Bat.152,                   Tel: 33 (0)1 69 08 9449    Lab

>> LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette,   FRANCE

>> http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura

>> http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html

>> e-mail: est...@cea.fr<mailto:est...@cea.fr>                             Fax: 
>> 33 (0)1 69 08 90 71

>

>--

>

>     ===============================================================

>     *                                                             *

>     * Gerard Bricogne                     
> g...@globalphasing.com<mailto:g...@globalphasing.com>  *

>     *                                                             *

>     * Global Phasing Ltd.                                         *

>     * Sheraton House, Castle Park         Tel: +44-(0)1223-353033 *

>     * Cambridge CB3 0AX, UK               Fax: +44-(0)1223-366889 *

>     *                                                             *

>     ===============================================================

Reply via email to