Hi Sebastien,

On Wed, 2009-07-29 at 18:59 +0200, Sebastien Binet wrote:
> hi David,
> 
> > I've been using pytables for offline analysis for a while now. Workflow
> > is simple: I extract a set of data from a database and store it in hdf5
> > using pytables and start doing analysis work. The thing is: database
> > performance is breaking down now that we have about 100M events stored
> > and after two years of patching indexes, queries, mysql settings and
> > things like that we're increasingly worrying about using a relational
> > database for data storage in the first place. Furthermore, when
> > extracting real data queries get horribly complex and the data must be
> > postprocessed before it can be useful. Once stored in pytables,
> > retrieving data is, of course, very easy.
> >
> > We've asked around what large experiments (LHC experiments like ATLAS)
> > are using and they are _not_ using db's for storage. That is expected
> > since a single event could take up in the order of a hundred Mb. The
> > point is that they are very happy with using ROOT for data storage. ROOT
> > is the analysis framework used by most high energy physicists and is
> > especially adapted to be used for data storage as well. However, not
> > everyone is happy with ROOT. Criticism mainly concerns the complexity of
> > ROOT and the cleanliness of the design.
> >
> > For python users, there is pyROOT. Of course, we know and love pytables.
> > We're going to test several things
> 
> please do keep me in the loop (being the librarian of pytables for Atlas) of 
> your findings.

Will do! Atlas uses pytables? For what exactly?

> 
> > We've asked around what large experiments (LHC experiments like ATLAS)
> > are using and they are _not_ using db's for storage. 
> I am a bit surprised by this statement as we do use Oracle and/or SQLite (via 
> an abstraction layer) for conditions data (detector geometry, time dependent 
> calibration 'constants',...) - not much for event data though.

We've specifically asked about event data. Right now that basically
means from simulations, but IIUC the same will hold for collisions. Raw
data in ROOT files, metadata in Oracle (apparently a very nice site-wide
license), followed by some talk about multi-tiered setups and the
perceived problems of congregating metadata from a final tier back into
the central Oracle database.

Still, take my comments as from someone who just asked around, and is
not part of Atlas. Misrepresentation is entirely my fault.

> BTW, being able to automatically translate an sqlite file (or directly read) 
> to an hdf5/pytables one (and vice-versa) would be interesting... (I haven't 
> even tried, it seems rather straight forward so probably the script already 
> exists)

For what is sqlite used? Seems like a strange solution to me when you
already have oracle, pytables and root.

Seems straightforward indeed. Something like (pseudo-code):

sql = "select * from events"
cursor.execute(sql)

for result in cursor.fetchall():
    tablerow['event_id'] = result[0]
    tablerow['timestamp'] = result[1]
    tablerow['n_jets'] = result[2]
    tablerow.append()

table.flush()

Regards,

David


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to