Re: [ccp4bb] database-assisted data archive

Ethan Merritt Wed, 18 Aug 2010 12:32:39 -0700

On Wednesday 18 August 2010 11:25:19 am Andreas Förster wrote:
> Thanks to everyone for the good ideas and suggestion.  Let me clarify 
> what I want.  A simple system that does one task.  I'm with James Holton 
> on complexity and with several others on wikis and databases.  They're 
> simple to set up and easy to use, but no one does, besides the one who 
> implemented them.  I've seen this with a lab wiki and a plasmid 
> database.  If the boss just approves of the project but doesn't enforce 
> usage, it won't be used.
> 
> That's why what I really want is an unavoidable system.

Our protocol makes use of a FileMaker database (the one Juergen Bosch
mentioned earlier) that tracks all mounted crystals.  It is both handy
and, as you say you want (but be careful what you wish for), unavoidable.
Juergen was largely responsible for setting it up in the first place,
but it has remained in continuous use since then.

This works for us because the great bulk of our data collection is done
using the BluIce interface to the SSRL beamlines.  As a requirement for
data collection, users must provide a spreadsheet that indexes
each crystal and its location in the  SSRL sample cassette.  
We create this spreadsheet directly as an export from our lab database.
The database itself assigns a unique systematic directory name for each 
crystal. The spreadsheet is then used by the beamline software to screen 
and collect data from all the crystals.  
The beamline software fills in screening information as it goes,
including the cell dimensions, etc, as determined by the automated
software.  The data images for each crystal are put into a uniquely
named directory as specified in the spreadsheet. After the run,
the updated spreadsheet is merged back into our lab database and
the data images are archived keeping their systematic uniquely
determined directory names.

Yes, if you work hard at it you can manage to mess up, say, the
human-interpretable meaning of the assigned systematic name.
But you cannot avoid the system altogether, because the only way
to reserve a slot for your crystal in the cassette being sent for
data collection is to enter its identifying information in the lab
database. 

There is still room to lose track of archived data at a larger scale.
Last I asked, TARDIS and the like cannot really help much with this.
If your 600 Gigabytes of archived data from 2008 are indexed as being
stored on disk XD_2008_2 in Room K407 of building HSB, it can tell
you exactly what directory on that disk corresponds to the data
from which crystal.  Unfortunately, it doesn't tell you that in fact 
that disk was moved to a room down the hall 6 months ago when the lab
was reorganized :-)

The drawbacks of this system are

- I wish I knew of an open-source linux-compatible equivalent 
  to FileMaker.  Nothing else I have looked at offered this level of 
  easy yet controlled access via a web browser from remote locations.

- Compliance with the protocol drops to less than 100% for datasets
  collected at home rather than at a beamline.  

- One is still faced with the issue of how to deal with archiving
  terabytes of data

        - Ethan

> I'm thinking of 
> an uploader that sits on the file server.  Only the uploader has write 
> permission.  The user calls the uploader because data is only backed up 
> on the file server, puts the data directory name into a box and fills in 
> a few other boxes (four or five) because otherwise the uploader won't 
> work.  The uploader interface could then be used to query the file 
> server and find datasets.  But the key is that the system MUST be used 
> to archive data - basically like flickr, but with the tag boxes 
> mandatory.  It's look like TARDIS (http://tardis.edu.au/) might have 
> such capabilities.
> 
> The discussion regarding LIMS and ISPyB and other fancy tracking systems 
> was fascinating, but I don't see those as relevant for my archiving 
> task.  For the same reason, xTrack doesn't fit my bill.  I want to bury 
> data, but not so deep that I don't find them should I ever need to.  I 
> don't care about space group or crystallization conditions or processing 
> information - the CCP4_DATABASE breaks with time anyway, either because 
> a user renamed directories or because the user's home directory has been 
> moved to /oldhome to make space for new users.  I just want to be able 
> to always find old data.
> 
> Going off on a tangent, associating a jpg of the first image (with 
> resolution rings) to each dataset is great.  Can the generation of such 
> images be automated, ie. a script for the whole directory tree?
> 
> All best.
> 
> 
> Andreas
> 
> 
> 
> On 18/08/2010 11:44, Eleanor Dodson wrote:
> > I would contact Johan Turkenburg here - he and sSam Hart have organised
> > the York data archive brilliantly - it is now pretty straightforward to
> > access any data back to ~ 1998 I think..
> >
> > Eleanor
> > j...@ysbl.york.ac.uk
> >
> > Andreas Förster wrote:
> >> Dear all,
> >>
> >> going through some previous lab member's data and trying to make sense
> >> of it, I was wondering what kind of solutions exist to simply the
> >> archiving and retrieval process.
> >>
> >> In particular, what I have in mind is a web interface that allows a
> >> user who has just returned from the synchrotron or the in-house
> >> detector to fill in a few boxes (user, name of protein, mutant, light
> >> source, quality of data, number of frames, status of project, etc) and
> >> then upload his data from the USB stick, portable hard drive or remote
> >> storage.
> >>
> >> The database application would put the data in a safe place (some file
> >> server that's periodically backed up) and let users browse through all
> >> the collected data of the lab with minimal effort later.
> >>
> >> I doesn't seem too hard to implement this, which is why I'm asking if
> >> anyone has done so already.
> >>
> >> Thanks.
> >>
> >>
> >> Andreas
> >>
> >
> >
> 
> 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] database-assisted data archive

Reply via email to