On Fri, Feb 08, 2008 at 10:13:05AM +0100, Ivan Vilata i Balaguer wrote:
> 
> I found the ``numpy.split()`` function, which may be what you need::
> [...] 
> The resulting sub-arrays share the same ``data``, so it should be
> memory-efficient.  Also, unless you expect empty sub-arrays, you won't
> need to store the first 0 index.  Then, to get pure Python lists::
> 
>   >>> for i in lrange(vlarray1.nrows):
>   ...     data = vlarray1[i]
>   ...     indices = vlarray2[i]
>   ...     foo([s.tolist() for s in numpy.split(data, indices)])
>
> But please remember to keep a ``numpy`` flavor for both ``VLArray``
> nodes.  Do you still have such high read times with this approach?

I have not used a nuympy flavor for these structures in the past. I
changed it now and the write speeds for this example has increased. It
is now comparable similar to the write speed of the pickling version.
The read spead however has been increased. The following are the average
timings for the same input dataset.

write   : 27.769s
read    :  4.605s

Since this is a portable way of inserting my datastructure I will
consider it even with the minor decrease in read speed. I however, is
requested to test the HDF5 version against an RDBMS version, mainly for
speed. If it happens that I would need to move to the pickled version to
push HDF5 through as the standard for my DB I will as it seems to be the
natural way of organizing my data. As I have mentioned I am a newbie
pytables user, but my liking to HDF5 is increasing exponentially.

-- 
Hatem Nassrat

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to