Hi Chris, When reading column-wise the cache contains 10 chunks only, so you may hardly notice any spike. Currently your chunk shape is favoured towards row-access. A chunk shape 250,4000 might be better. It also has the advantage that the cache is smaller.
On Linux you could use strace to see which IO requests are actually being done. I have a test program to test performance when reading a 3-dim dataset (x,y,z) along various axes. Below you see my test results (on a MacBook) for chunk shape 1000,1000,1 and 250,4000,1 when accessing x (column) and y (row). You can see that the time spent in user mode is highly dependent on chunk shape and access pattern. The system (IO) time hardly changes. Also note that HDF5 performance degrades quite a lot when doing many small hyperslab accesses. Cheers, Ger macdiepen2-4.test> time tHDF5 10000 200000 1 1000 1000 1 x setting cache to 10 chunks (40000000 bytes) with 1009 slots real2m38.305s user0m3.972s sys0m3.483s macdiepen2-4.test> time tHDF5 10000 200000 1 1000 1000 1 y setting cache to 200 chunks (800000000 bytes) with 20011 slots real4m32.527s user1m8.046s sys0m4.787s macdiepen2-4.test> time tHDF5 10000 200000 1 250 4000 1 x setting cache to 40 chunks (160000000 bytes) with 4001 slots real2m14.578s user0m13.575s sys0m3.848s macdiepen2-4.test> time tHDF5 10000 200000 1 250 4000 1 y setting cache to 50 chunks (200000000 bytes) with 5003 slots real2m57.442s user0m46.541s sys0m4.043s >>> "Jewell, Christopher" <[email protected]> 11/27/2015 1:49 AM >>> Hi, I am working with a 10000 row by 200000 column matrix of 4-byte floats. The matrix is written unavoidably in sequential row-major order, but needs to be read in a sequential column-wise order. I chunk the matrix into 1000x1000 chunks, to compromise on write performance (row-wise) and read performance (column-wise). When writing the file, I set the chunk cache to be big enough to hold an entire row’s worth of chunks (i.e. 200000 / 1000 chunks multiplied by 4e6 bytes). My write times per row are of the order of 5ms, and the algorithm pauses after each 1000 rows. By monitoring I/O at the filesystem level, I see spikes of disk activity during these pauses, with transfer rates approaching maximum. I conclude that the chunk cache is effectively buffering 1000 rows of the matrix, and flushing to disk only when all chunks have been written. So far so good — HDF5 is making my life easy :) However when reading the file, I reserve enough chunk cache to accommodate a column’s worth of chunks (10000 / 1000 chunks multiplied by 4e6 bytes). My column read time is of the order of 10ms, but I don’t see pauses or spikes of disc activity as with the write. Instead, I get a steady trickle of disc activity that does not appear to be correlated with the chunk width as I was expecting. Therefore, it appears that the chunk cache is not being used. 1) Should I expect this behaviour? 2) Have I set up the chunk cache correctly (code below), and do I have to explicitly tell HDF5 to read data chunk-wise from a chunked-layout file? 3) How best to monitor cache flushing/pre-emption activity? Thanks, Chris Sample C++ code: // Cache H5::FileAccPropList fprops = file.getAccessPlist(); int mdc; size_t ccelems; size_t ccnbytes; double w0; fprops.getCache(mdc, ccelems, ccnbytes, w0); size_t chunksPerCol = 10000 / 1000; ccnbytes = chunksPerCol * chunkDim[0] * chunkDim[1] * sizeof(float); fprops.setCache(mdc, ccelems, ccnbytes, w0); _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
