Hi all,

We have a general data exploration GUI that allows users to slice (or take 
slabs) across any dimensions of a dataset. With a file that contains a 4D 
dataset which is a 2D scan where each scan point is an image, the data is 
chunked in the last dimension. Looking a 2D slices across 3rd and 4th 
dimensions works fine.

However, slicing across 2nd and 3rd dimensions causes memory usage to peak 
unusually high - in fact, high enough to cause a JVM crash in our Java GUI when 
it runs out of heap memory. We can demonstrate this effect too using h5py by 
comparing a single 2D access to a line-by-line access to the dataset:

-----8<-----
import h5py

f = h5py.File('/dls/i22/data/2011/sw5604-1/i22-34808-Pilatus2M.h5', 'r')

print 'Version:', h5py.h5.get_libversion() print 'Driver:', f.driver

d = f.get('/entry/instrument/detector/data')

pl = d.id.get_create_plist()

print 'Chunking:', pl.get_chunk()

from time import time
import numpy as np

s = d.shape
print 'Shape:', s

n = -time()
l = []
for i in range(s[1]):
        l.append(d[0,i,:,0])
b = np.vstack(l)
n += time()
print 'Line-by-line slice:', n

n = -time()
a = d[0,:,:,0]
n += time()
print 'Whole slice:', n

print 'All same:', np.all(a == b)

print 'Sum:', a.sum(), b.sum()
-----8<-----

Gives output:
-----8<-----
Version: (1L, 8L, 7L)
Driver: sec2
Chunking: (1, 1, 1, 1475)
Shape: (1, 120, 1679, 1475)
Line-by-line slice: 31.2174389362
Whole slice: 4.58652997017
All same: True
Sum: 219855 219855
[src77879@ws042 ~]$ python testh5.py
Version: (1L, 8L, 7L)
Driver: sec2
Chunking: (1, 1, 1, 1475)
Shape: (1, 120, 1679, 1475)
Line-by-line slice: 3.69943618774
Whole slice: 4.54985308647
All same: True
Sum: 219855 219855
-----8<-----

This shows that line-by-line access is quicker and monitoring memory usage with 
Gnome's system monitor illustrates the problem in the attached image. The first 
bump in user memory is when the script is run to warm up the file cache (the 
file lives off an NFS mount). The second bump has a very small leading shoulder 
when the line-by-line slicing occurs and the main rise is caused by the whole 
slab access.

This was using h5py version 2.0.0 with hdf5 1.8.7 on a 32-bit RHEL 5 Core2 Duo 
box with 4G RAM. The kernel is 2.6.18-274.el5PAE.

Given that we see this with the HL Java library and also h5py, I believe this 
is a low-level issue rather than a binding wrapper issue. Can any developers 
replicate this behaviour? If so, is it fixable?

Thanks in advance,
 Peter

--
Dr Peter Chang
T:+44 1235 778092; F:+44 1235 778468
Data Analyst Mathematical & Statistical Software Developer Diamond Light Source 
Ltd, Diamond House, Harwell Science & Innovation Campus, Didcot, Oxfordshire
OX11 0DE





-- 

This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.

Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 

Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.

Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

 







<<attachment: memprofile.png>>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to