Re: [Pytables-users] Simple fast array I/O with Pytables

Dav Clark Fri, 30 Dec 2011 14:17:47 -0800

On Dec 30, 2011, at 8:40 AM, Francesc Alted wrote:

> 2011/12/30 Gael Varoquaux <[email protected]>:
>> Hi list,
>> 
>> I am trying to do a simple comparison of various I/O libraries to save a
>> bunch of numpy arrays. I don't have time to actually invest in PyTables
>> now, but it has always been on my radar. I wanted to get a ball-park
>> estimate of what was achievable with PyTables in terms of read/write
>> performance. I wrote a quick pair of read and write functions, and I am
>> getting really bad performance.
>> 
>> Obviously, I should invest in learning PyTables, but right now I am just
>> trying to get figures to justify such an investement. Can somebody have a
>> look at the following code to see if I haven't forgotten something
>> obvious that would make I/O faster. Sorry, I feel like I am asking you to
>> do my work, but I hate it that Pytabls is coming out so bad on the
>> benchs:
> [clip]
> 
> This depends a lot on the sort of arrays you are trying to save. Have
> they the same shape and type?  Then it is best to save them in a
> monolithic Array (or an EArray, if you want to use compression).
> 
> If they have the same type but different shapes, then using a separate
> entry in the same VLArray would be more effective.  In case the arrays
> are large, it may be useful to use a high performance compressor (e.g.
> Blosc) so as to reduce its size.
> 
> If your arrays do not share dtypes or shapes at all, then I'm afraid
> this the best performance you can expect from PyTables.  Is this that
> bad compared with other options?


What about compression? I'm guessing you're comparing to .npz files, which 
would be compressed but likely without the efficiency of blosc5. You'll 
probably get a modest net savings on write + read time. See:

http://pytables.github.com/usersguide/optimization.html

for trade-offs in read and write speeds.

The following has code for creating compressed CArrays (which I'm guessing is 
appropriate for your needs - I suspect your arrays won't change size / shape on 
disk):

http://pytables.github.com/usersguide/libref.html#carrayclassdescr

I'll add that it would be cool to see what numbers you come up with (maybe with 
some loose specs on the machine CPU and disk you used).

Cheers,
Dav



------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Simple fast array I/O with Pytables

Reply via email to