On 12/5/12 7:55 PM, Alvaro Tejero Cantero wrote:
> My system was benched for reads and writes with Blosc[1]:
>
> with pt.openFile(paths.braw(block), 'r') as handle:
> pt.setBloscMaxThreads(1)
> %timeit a = handle.root.raw.c042[:]
> pt.setBloscMaxThreads(6)
> %timeit a = handle.root.raw.c042[:]
> pt.setBloscMaxThreads(11)
> %timeit a = handle.root.raw.c042[:]
> print handle.root.raw._v_attrs.FILTERS
> print handle.root.raw.c042.__sizeof__()
> print handle.root.raw.c042
>
> gives
>
> 1 loops, best of 3: 483 ms per loop
> 1 loops, best of 3: 782 ms per loop
> 1 loops, best of 3: 663 ms per loop
> Filters(complevel=5, complib='blosc', shuffle=True, fletcher32=False)
> 104
> /raw/c042 (CArray(303390000,), shuffle, blosc(5)) ''
>
> I can't understand what is going on, for the life of me. These
> datasets use int16 atoms and at Blosc complevel=5 used to compress by
> a factor of about 2. Even for such low compression ratios there should
> be huge differences between single- and multi-threaded reads.
>
> Do you have any clue?
Yeah, welcome to the wonderful art of fine tuning. Fortunately we have
a machine which is pretty identical to yours (hey, your computer was too
good in Blosc benchmarks so as to ignore it :), so I can reproduce your
issue:
In [3]: a = ((np.random.rand(3e8))*100).astype('i2')
In [4]: f = tb.openFile("test.h5", "w")
In [5]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape,
filters=tb.Filters(5, complib="blosc"))
In [6]: act[:] = a
In [7]: f.flush()
In [8]: ll test.h5
-rw-rw-r-- 1 faltet 301719914 Dec 6 04:55 test.h5
This random set of numbers is close to your array in size (~3e8
elements), and also has a similar compression factor (~2x). Now the
timings (using 6 cores by default):
In [9]: timeit act[:]
1 loops, best of 3: 441 ms per loop
In [11]: tb.setBloscMaxThreads(1)
Out[11]: 6
In [12]: timeit act[:]
1 loops, best of 3: 347 ms per loop
So yeah, that might seem a bit disappointing. It turns out that the
default chunksize for PyTables is tuned so as to balance among
sequential and random reads. If what you want is to optimize only for
sequential reads (apparently this is what you are after, right?), then
it normally helps to increase the chunksize. For example, by doing some
quick trials, I determined that a chunksize of 2 MB is pretty optimal
for sequential access:
In [44]: f.removeNode(f.root.act)
In [45]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape,
filters=tb.Filters(5, complib="blosc"), chunkshape=(2**20,))
In [46]: act[:] = a
In [47]: tb.setBloscMaxThreads(1)
Out[47]: 6
In [48]: timeit act[:]
1 loops, best of 3: 334 ms per loop
In [49]: tb.setBloscMaxThreads(3)
Out[49]: 1
In [50]: timeit act[:]
1 loops, best of 3: 298 ms per loop
In [51]: tb.setBloscMaxThreads(6)
Out[51]: 3
In [52]: timeit act[:]
1 loops, best of 3: 303 ms per loop
Also, we see here that the sweet point is using 3 threads, not more
(don't ask why). However, that does not mean that Blosc is not able to
work faster on this machine, and in fact it does:
In [59]: import blosc
In [60]: sa = a.tostring()
In [61]: ac2 = blosc.compress(sa, 2, clevel=5)
In [62]: blosc.set_nthreads(6)
Out[62]: 6
In [64]: timeit a2 = blosc.decompress(ac2)
10 loops, best of 3: 80.7 ms per loop
In [65]: blosc.set_nthreads(1)
Out[65]: 6
In [66]: timeit a2 = blosc.decompress(ac2)
1 loops, best of 3: 249 ms per loop
So that means that a pure Blosc compression in-memory can only go 4x
faster than PyTables + Blosc, and in this is case the latter is reaching
an excellent mark of 2 GB/s, which is really good for a read from disk
operation. Note how a memcpy() operation in this machine is just about
as good as this:
In [36]: timeit a.copy()
1 loops, best of 3: 294 ms per loop
Now that I'm on this, I'm curious on how other compressors would perform
for this scenario:
In [6]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape,
filters=tb.Filters(5, complib="lzo"), chunkshape=(2**20,))
In [7]: act[:] = a
In [8]: f.flush()
In [9]: ll test.h5 # compression ratio very close to Blosc
-rw-rw-r-- 1 faltet 302769510 Dec 6 05:23 test.h5
In [10]: timeit act[:]
1 loops, best of 3: 1.13 s per loop
so, the time for LZO is more than 3x slower than Blosc. And a similar
thing with zlib:
In [12]: f.close()
In [13]: f = tb.openFile("test.h5", "w")
In [14]: act = f.createCArray(f.root, 'act', tb.Int16Atom(), a.shape,
filters=tb.Filters(1, complib="zlib"), chunkshape=(2**20,))
In [15]: act[:] = a
In [16]: f.flush()
In [17]: ll test.h5 # the compression rate is somewhat better
-rw-rw-r-- 1 faltet 254821296 Dec 6 05:26 test.h5
In [18]: timeit act[:]
1 loops, best of 3: 2.24 s per loop
which is 6x slower than Blosc (although the compression ratio is a bit
better).
And just for matter of completeness, let's see how fast can perform
carray (the package, not the CArray object in PyTables) for a chunked
array in-memory:
In [19]: import carray as ca
In [20]: ac3 = ca.carray(a, chunklen=2**20, cparams=ca.cparams(5))
In [21]: ac3
Out[21]:
carray((300000000,), int16)
nbytes: 572.20 MB; cbytes: 289.56 MB; ratio: 1.98
cparams := cparams(clevel=5, shuffle=True)
[59 34 36 ..., 21 58 50]
In [22]: timeit ac3[:]
1 loops, best of 3: 254 ms per loop
In [23]: ca.set_nthreads(1)
Out[23]: 6
In [24]: timeit ac3[:]
1 loops, best of 3: 282 ms per loop
So, with 254 ms, it is only marginally faster than PyTables (~298 ms).
Now with a carray object on-disk:
In [27]: acd = ca.carray(a, chunklen=2**20, cparams=ca.cparams(5),
rootdir="test")
In [28]: acd
Out[28]:
carray((300000000,), int16)
nbytes: 572.20 MB; cbytes: 289.56 MB; ratio: 1.98
cparams := cparams(clevel=5, shuffle=True)
rootdir := 'test'
[59 34 36 ..., 21 58 50]
In [30]: ca.set_nthreads(6)
Out[30]: 1
In [31]: timeit acd[:]
1 loops, best of 3: 317 ms per loop
In [32]: ca.set_nthreads(1)
Out[32]: 6
In [33]: timeit acd[:]
1 loops, best of 3: 361 ms per loop
The times in this case are a bit larger than with PyTables (317ms vs
298ms), which speaks a lot how efficiently is implemented I/O in
HDF5/PyTables stack.
--
Francesc Alted
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users