On Fri, Feb 08, 2008 at 10:13:05AM +0100, Ivan Vilata i Balaguer wrote: > > I found the ``numpy.split()`` function, which may be what you need:: > [...] > The resulting sub-arrays share the same ``data``, so it should be > memory-efficient. Also, unless you expect empty sub-arrays, you won't > need to store the first 0 index. Then, to get pure Python lists:: > > >>> for i in lrange(vlarray1.nrows): > ... data = vlarray1[i] > ... indices = vlarray2[i] > ... foo([s.tolist() for s in numpy.split(data, indices)]) > > But please remember to keep a ``numpy`` flavor for both ``VLArray`` > nodes. Do you still have such high read times with this approach?
I have not used a nuympy flavor for these structures in the past. I changed it now and the write speeds for this example has increased. It is now comparable similar to the write speed of the pickling version. The read spead however has been increased. The following are the average timings for the same input dataset. write : 27.769s read : 4.605s Since this is a portable way of inserting my datastructure I will consider it even with the minor decrease in read speed. I however, is requested to test the HDF5 version against an RDBMS version, mainly for speed. If it happens that I would need to move to the pickled version to push HDF5 through as the standard for my DB I will as it seems to be the natural way of organizing my data. As I have mentioned I am a newbie pytables user, but my liking to HDF5 is increasing exponentially. -- Hatem Nassrat ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pytables-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pytables-users
