Hi all, We have a general data exploration GUI that allows users to slice (or take slabs) across any dimensions of a dataset. With a file that contains a 4D dataset which is a 2D scan where each scan point is an image, the data is chunked in the last dimension. Looking a 2D slices across 3rd and 4th dimensions works fine.
However, slicing across 2nd and 3rd dimensions causes memory usage to peak
unusually high - in fact, high enough to cause a JVM crash in our Java GUI when
it runs out of heap memory. We can demonstrate this effect too using h5py by
comparing a single 2D access to a line-by-line access to the dataset:
-----8<-----
import h5py
f = h5py.File('/dls/i22/data/2011/sw5604-1/i22-34808-Pilatus2M.h5', 'r')
print 'Version:', h5py.h5.get_libversion() print 'Driver:', f.driver
d = f.get('/entry/instrument/detector/data')
pl = d.id.get_create_plist()
print 'Chunking:', pl.get_chunk()
from time import time
import numpy as np
s = d.shape
print 'Shape:', s
n = -time()
l = []
for i in range(s[1]):
l.append(d[0,i,:,0])
b = np.vstack(l)
n += time()
print 'Line-by-line slice:', n
n = -time()
a = d[0,:,:,0]
n += time()
print 'Whole slice:', n
print 'All same:', np.all(a == b)
print 'Sum:', a.sum(), b.sum()
-----8<-----
Gives output:
-----8<-----
Version: (1L, 8L, 7L)
Driver: sec2
Chunking: (1, 1, 1, 1475)
Shape: (1, 120, 1679, 1475)
Line-by-line slice: 31.2174389362
Whole slice: 4.58652997017
All same: True
Sum: 219855 219855
[src77879@ws042 ~]$ python testh5.py
Version: (1L, 8L, 7L)
Driver: sec2
Chunking: (1, 1, 1, 1475)
Shape: (1, 120, 1679, 1475)
Line-by-line slice: 3.69943618774
Whole slice: 4.54985308647
All same: True
Sum: 219855 219855
-----8<-----
This shows that line-by-line access is quicker and monitoring memory usage with
Gnome's system monitor illustrates the problem in the attached image. The first
bump in user memory is when the script is run to warm up the file cache (the
file lives off an NFS mount). The second bump has a very small leading shoulder
when the line-by-line slicing occurs and the main rise is caused by the whole
slab access.
This was using h5py version 2.0.0 with hdf5 1.8.7 on a 32-bit RHEL 5 Core2 Duo
box with 4G RAM. The kernel is 2.6.18-274.el5PAE.
Given that we see this with the HL Java library and also h5py, I believe this
is a low-level issue rather than a binding wrapper issue. Can any developers
replicate this behaviour? If so, is it fixable?
Thanks in advance,
Peter
--
Dr Peter Chang
T:+44 1235 778092; F:+44 1235 778468
Data Analyst Mathematical & Statistical Software Developer Diamond Light Source
Ltd, Diamond House, Harwell Science & Innovation Campus, Didcot, Oxfordshire
OX11 0DE
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the use of the intended addressee only. If you
are not the intended addressee or an authorised recipient of the addressee
please notify us of receipt by returning the e-mail and do not use, copy,
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not
necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments
are free from viruses and we cannot accept liability for any damage which you
may sustain as a result of software viruses which may be transmitted in or with
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and
Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
<<attachment: memprofile.png>>
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
