Re: [Pytables-users] very slow reading across multiple chunks

Francesc Alted Wed, 01 Apr 2009 03:08:50 -0700

A Tuesday 24 March 2009, Francesc Alted escrigué:
> A Thursday 19 March 2009, Francesc Alted escrigué:
> > That's a good question, and one that I'd like to know the answer
> > too! After some digging, I've tracked down the problem to be in the
> > HDF5 library.  I've reported that to the hdf-forum list and I'll
> > report here whatever response the HDF5 crew would give.
>
> Seems to be confirmed that this is a HDF5 issue.  The HDF5 people
> will have a look at it.  I'll keep the list informed about further
> progress on this.


More on this.  It seems that this is a problem with the size of the HDF5 
chunk cache.  I've opened a new ticket to allow to set this size 
directly from the PyTables API:

http://www.pytables.org/trac/ticket/221

Meanwhile, with the attached patch (againts PyTables' trunk) you can 
adapt the HDF5 chunk cache size (only when the file is opened for 
reading) to your specific chunk size.

However, I still think that HDF5 can be further optimized in this regard 
(read below).

To be continued...


----------  Missatge transmès  ----------

Subject: Re: [hdf-forum] Reading across multiple chunks is very slow
Date: Wednesday 01 April 2009
From: Francesc Alted <[email protected]>
To: Neil Fortner <[email protected]>

Hi Neil,

A Tuesday 31 March 2009, Neil Fortner escrigué:
> This is happening because (in the "time for two trailing indices"
> case) the individual chunks are not contiguous in memory, as Ger
> pointed out. Also, because the chunks are larger than the chunk cache
> size (default=1 MB), the library makes a best effort to avoid having
> to allocate enough memory in the chunk.  Therefore it reads directly
> from the file into the supplied read buffer.  Because the selection
> in the read buffer (for each chunk) is a series of small
> non-contiguous blocks, the library must make a large number of small
> reads.

I see.  That makes sense.

> To improve performance, you can increase the chunk cache size with
> H5Pset_cache (or the new H5Pset_chunk_cache function if you're using
> the latest snapshot).  The test runs in about .7 seconds with this
> change on my laptop, down from ~30 seconds.  This is still more time
> than for the contiguous case, because the library must allocate the
> extra space and scatter each element individually from the cache to
> the read buffer, but now only calls read once for each chunk.

Yes, it works a lot better now!  However, after setting the chunk cache 
size to 12 MB (a bit larger than my chunk size, which is 11.8 MB) the 
performance is still a long way to be optimal, IMHO.  Look at this 
numbers:

For a default chunk cache size (1 MB):

time for [0:2,:,:,0] --> 0.15
time for [0,:,:,0:2] --> 7.615

With an increased chunk cache size (12 MB):

time for [0:2,:,:,0] --> 0.165
time for [0,:,:,0:2] --> 1.312

So, despite that the new time is around 6x better, it is still almost 8x 
slower than the contiguous case.  In order to simulate the time that 
could take to scatter each element from the cache to the read buffer, 
I've computed the time that takes a similar process with NumPy:

In [28]: a = np.arange(1978*1556*2, dtype="int32")

In [29]: b = np.empty(1978*1556*2, dtype="int32")

In [30]: timeit b[:b.size/2] = a[::2]
10 loops, best of 3: 40.7 ms per loop

In [31]: timeit b[b.size/2:] = a[1::2]
10 loops, best of 3: 40.1 ms per loop

In [32]: a
Out[32]: array([      0,       1,       2, ..., 6155533, 6155534, 
6155535])

In [33]: b
Out[33]: array([      0,       2,       4, ..., 6155531, 6155533, 
6155535])

As can be seen, the scatter process completes in around 40 + 40 = 80 ms 
for my chunk size.  Hence, I'd expect the second read case to complete 
in around 0.17 + 0.08 = 0.25 seconds, while the actual time is still 5x 
slower.  So, unless I'm missing something, my guess is that the scatter 
code in the HDF5 library could be made a lot faster.


-- 
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'.  In fact, the fightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
   "On the cruelty of really teaching computer science"

Index: tables/hdf5Extension.pyx
===================================================================
--- tables/hdf5Extension.pyx	(revision 4113)
+++ tables/hdf5Extension.pyx	(working copy)
@@ -64,6 +64,7 @@
      H5G_GROUP, H5G_DATASET, H5G_stat_t, \
      H5T_class_t, H5T_sign_t, H5T_NATIVE_INT, \
      H5F_SCOPE_GLOBAL, H5F_ACC_TRUNC, H5F_ACC_RDONLY, H5F_ACC_RDWR, \
+     H5P_FILE_CREATE, H5P_FILE_ACCESS, \
      H5P_DEFAULT, H5T_SGN_NONE, H5T_SGN_2, H5T_DIR_DEFAULT, \
      H5S_SELECT_SET, H5S_SELECT_AND, H5S_SELECT_NOTB, \
      H5get_libversion, H5check_version, H5Fcreate, H5Fopen, H5Fclose, \
@@ -304,11 +305,11 @@
 
     if pymode == 'r':
       # Just a test for disabling metadata caching.
-      ## access_plist = H5Pcreate(H5P_FILE_ACCESS)
-      ## H5Pset_cache(access_plist, 0, 0, 0, 0.0)
+      access_plist = H5Pcreate(H5P_FILE_ACCESS)
+      H5Pset_cache(access_plist, 0, 10, 12*1024*1024, 0.0)
       ## H5Pset_sieve_buf_size(access_plist, 0)
-      ##self.file_id = H5Fopen(encname, H5F_ACC_RDONLY, access_plist)
-      self.file_id = H5Fopen(encname, H5F_ACC_RDONLY, H5P_DEFAULT)
+      self.file_id = H5Fopen(encname, H5F_ACC_RDONLY, access_plist)
+      ##self.file_id = H5Fopen(encname, H5F_ACC_RDONLY, H5P_DEFAULT)
     elif pymode == 'r+':
       self.file_id = H5Fopen(encname, H5F_ACC_RDWR, H5P_DEFAULT)
     elif pymode == 'a':

------------------------------------------------------------------------------

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] very slow reading across multiple chunks

Reply via email to