PyTable Users,

I've read the following thread in an attempt to better understand how to 
organize a 2D EArray/CArray and retain the ability to efficiently select rows 
or columns.

http://www.mail-archive.com/[email protected]/msg00723.html

In this thread it was suggested that access to the columns of an EArray that 
was built by appending rows could be done efficiently if the appropriate 
chunkshape is passed (At least by my reading).  It was also suggested that a 
second copy of the data be stored in a different orientation but this statement 
was a bit unclear.  What I'm looking for is a clear example of how to 
efficiently access the columns an array build by appending rows.  My data come 
in as a series of rows but I would like to be able to read the columns in a 
reasonable amount of time.  

Below I have a code snippet that creates a fairly large EArray by appending 
rows.  Can anyone provide some insight on how to access these columns 
efficiently and or how to make a second copy of the data in the file using the 
appropriate chunkshape?  (It is the chunkshape aspect that I'm unclear on how 
that size is chosen).  Thanks for all your help.

Brian

#################Begin Snippet:

import tables as T
import numpy  as N
import time
  
t1 = time.clock()
hdf = T.openFile('test.h5', mode = "w", title = '')

atom = T.Int32Atom()
#shape = (?,?
#chunkshape = (?,?)
rows = 400
columns = 350000
arr = N.random.random(rows)*100
shape = (rows, columns)#(rows, columns)
filters = T.Filters(complevel=5, complib='zlib')

ea = hdf.createEArray(hdf.root, "EArray", atom, (0, rows),  filters = filters,  
expectedrows = rows)

for i in xrange(columns):
    arr = N.random.random(rows)*100
    #print i
    ea.append(arr[N.newaxis,:])
    ea.flush()
    if i%10000 == 0:
        print i

#ea[:,1] #is really slow, whereas,
#ea[1] #is fast, how to use chunkshape in order to effeciently access columns 
when
          #the array was built by rows?

hdf.close()
print "Done"
t2 = time.clock()
print t2-t1





      
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to