Dear James,

Good analysis! You bring up important points.

On 10/24/11 23:56, James Holton wrote:
The Pilatus is fast, but or decades now we have had detectors that can read
out in ~1s. This means that you can collect a typical ~100 image dataset in a
few minutes (if flux is not limiting). Since there are ~150 beamlines
currently operating around the world and they are open about 200 days/year,
we should be collecting ~20,000,000 datasets each year.

We're not.

The PDB only gets about 8000 depositions per year, which means either we
throw away 99.96% of our images, or we don't actually collect images anywhere
near the ultimate capacity of the equipment we have. In my estimation, both
of these play about equal roles, with ~50-fold attrition between ultimate
data collection capacity and actual collected data, and another ~50 fold
attrition between collected data sets and published structures.

Your estimation says: we collect 1/50 * 20,000,000 = 400,000 data sets of which only 8,000 get deposited. An average Pilatus data set (0.1 degree scan) takes about 4 Gb (compressed, without loosing information. EVAL can read those!). Storing the 8,000 data sets, as James Stroud mentions, can not be the problem. It is the 392,000 other data sets that we have to find a home for. That would be 1568 Tb and would cost 49,000 $/year. This may be a slight overestimation, but it shows us the problems we face if we want to store ALL raw data.

Even if we would find a way to store all these data, how would we set up a useful data base? If we store all data by name, date and beamline, we will in the end inevitable be drowning is a sea of information. It is very unlikely that the very interesting data sets will ever be found and used. It would be much more useful if every data sets would be annotated by the user or beam line scientist. Like: "impossible to index", "bad data from integration step", "overlap", "diffuse streaks" etc. Such information could be part of the meta data. This however, takes time and may not fit the eagerness to get results from one of the other data sets recorded at the same synchrotron trip.
I am afraid that just throwing data sets in a big pool, will not be very useful.

Loes.
--
__________________________________________

Dr. Loes Kroon-Batenburg
Dept. of Crystal and Structural Chemistry
Bijvoet Center for Biomolecular Research
Utrecht University
Padualaan 8, 3584 CH Utrecht
The Netherlands

E-mail : l.m.j.kroon-batenb...@uu.nl
phone  : +31-30-2532865
fax    : +31-30-2533940
__________________________________________

Reply via email to