On Dec 30, 2011, at 8:40 AM, Francesc Alted wrote: > 2011/12/30 Gael Varoquaux <gael.varoqu...@normalesup.org>: >> Hi list, >> >> I am trying to do a simple comparison of various I/O libraries to save a >> bunch of numpy arrays. I don't have time to actually invest in PyTables >> now, but it has always been on my radar. I wanted to get a ball-park >> estimate of what was achievable with PyTables in terms of read/write >> performance. I wrote a quick pair of read and write functions, and I am >> getting really bad performance. >> >> Obviously, I should invest in learning PyTables, but right now I am just >> trying to get figures to justify such an investement. Can somebody have a >> look at the following code to see if I haven't forgotten something >> obvious that would make I/O faster. Sorry, I feel like I am asking you to >> do my work, but I hate it that Pytabls is coming out so bad on the >> benchs: > [clip] > > This depends a lot on the sort of arrays you are trying to save. Have > they the same shape and type? Then it is best to save them in a > monolithic Array (or an EArray, if you want to use compression). > > If they have the same type but different shapes, then using a separate > entry in the same VLArray would be more effective. In case the arrays > are large, it may be useful to use a high performance compressor (e.g. > Blosc) so as to reduce its size. > > If your arrays do not share dtypes or shapes at all, then I'm afraid > this the best performance you can expect from PyTables. Is this that > bad compared with other options?
What about compression? I'm guessing you're comparing to .npz files, which would be compressed but likely without the efficiency of blosc5. You'll probably get a modest net savings on write + read time. See: http://pytables.github.com/usersguide/optimization.html for trade-offs in read and write speeds. The following has code for creating compressed CArrays (which I'm guessing is appropriate for your needs - I suspect your arrays won't change size / shape on disk): http://pytables.github.com/usersguide/libref.html#carrayclassdescr I'll add that it would be cool to see what numbers you come up with (maybe with some loose specs on the machine CPU and disk you used). Cheers, Dav ------------------------------------------------------------------------------ Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastructure or vast IT resources to deliver seamless, secure access to virtual desktops. With this all-in-one solution, easily deploy virtual desktops for less than the cost of PCs and save 60% on VDI infrastructure costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users