Hi Chris, 

When reading column-wise the cache contains 10 chunks only, so you may
hardly notice any spike. 
Currently your chunk shape is favoured towards row-access. A chunk
shape 250,4000 might be better. It also has the advantage that the cache
is smaller. 

On Linux you could use strace to see which IO requests are actually
being done. 

I have a test program to test performance when reading a 3-dim dataset
(x,y,z) along various axes.

Below you see my test results (on a MacBook) for chunk shape
1000,1000,1 and 250,4000,1 when accessing x (column) and y (row). You
can see that the time spent in user mode is highly dependent on chunk
shape and access pattern. The system (IO) time hardly changes. Also note
that HDF5 performance degrades quite a lot when doing many small
hyperslab accesses. 

Cheers, 
Ger
 
macdiepen2-4.test> time tHDF5 10000 200000 1 1000 1000 1 x 
setting cache to 10 chunks (40000000 bytes) with 1009 slots 
real2m38.305s 
user0m3.972s 
sys0m3.483s 

macdiepen2-4.test> time tHDF5 10000 200000 1 1000 1000 1 y 
setting cache to 200 chunks (800000000 bytes) with 20011 slots 
real4m32.527s 
user1m8.046s 
sys0m4.787s 

macdiepen2-4.test> time tHDF5 10000 200000 1 250 4000 1 x 
setting cache to 40 chunks (160000000 bytes) with 4001 slots 
real2m14.578s 
user0m13.575s 
sys0m3.848s 
  
macdiepen2-4.test> time tHDF5 10000 200000 1 250 4000 1 y 
setting cache to 50 chunks (200000000 bytes) with 5003 slots 
real2m57.442s 
user0m46.541s 
sys0m4.043s 

>>> "Jewell, Christopher" <[email protected]> 11/27/2015 1:49 AM
>>>
Hi,

I am working with a 10000 row by 200000 column matrix of 4-byte floats.
 The matrix is written unavoidably in sequential row-major order, but
needs to be read in a  sequential column-wise order.  I chunk the matrix
into 1000x1000 chunks, to compromise on write performance (row-wise) and
read performance (column-wise).

When writing the file, I set the chunk cache to be big enough to hold
an entire row’s worth of chunks (i.e. 200000 / 1000 chunks multiplied by
4e6 bytes).  My write times per row are of the order of 5ms, and the
algorithm pauses after each 1000 rows.  By monitoring I/O at the
filesystem level, I see spikes of disk activity during these pauses,
with transfer rates approaching maximum.  I conclude that the chunk
cache is effectively buffering 1000 rows of the matrix, and flushing to
disk only when all chunks have been written.  So far so good — HDF5 is
making my life easy :)

However when reading the file, I reserve enough chunk cache to
accommodate a column’s worth of chunks (10000 / 1000 chunks multiplied
by 4e6 bytes).  My column read time is of the order of 10ms, but I don’t
see pauses or spikes of disc activity as with the write.  Instead, I get
a steady trickle of disc activity that does not appear to be correlated
with the chunk width as I was expecting.  Therefore, it appears that the
chunk cache is not being used.

1) Should I expect this behaviour?
2) Have I set up the chunk cache correctly (code below), and do I have
to explicitly tell HDF5 to read data chunk-wise from a chunked-layout
file?
3) How best to monitor cache flushing/pre-emption activity?

Thanks,

Chris


Sample C++ code:

// Cache
H5::FileAccPropList fprops = file.getAccessPlist();
int mdc;
size_t ccelems;
size_t ccnbytes;
double w0;
fprops.getCache(mdc, ccelems, ccnbytes, w0);

size_t chunksPerCol = 10000 / 1000;
ccnbytes = chunksPerCol * chunkDim[0] * chunkDim[1] * sizeof(float);

fprops.setCache(mdc, ccelems, ccnbytes, w0);


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5 
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to