Hi,

See some comments embedded below. . .

From: Hdf-forum 
<[email protected]<mailto:[email protected]>>
 on behalf of SOLTYS Radoslaw 
<[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, January 26, 2016 2:55 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Hdf-forum] Working with lots of HDF5 files

We’re looking into replacing our custom storage design for time-series data 
with HDF5 and we’re looking mainly at HDF5 version 1.10 for the SWMR capability 
as we’re doing this already with our custom storage.
To find out the best layout – we drafted a few test cases and started off a 
tutorial code sample in C++, adjusting it to replicate our current database 
structure, being one file per signal –

Hmm. I a "file per signal" could be a poor choice. It depends on how "big" a 
signal is and whether your workflows can easily be re-tooled for a "many 
signals in one file" paradigm". But, I would think you'd want to write many 
time series to the same HDF5 file each as its own 'dataset', perhaps in its own 
'group' within the file. You can create meaningful group/folder hierarchies 
*within* an HDF5 file (kinda like dirs in linux or folders in Windows/OS X) 
which makes it very convenient to organize data

so we are creating new empty files in a loop – and there we already ran into 
problems:

-       the HDF5 garbage collector allocates lots of memory as soon as files 
are created – we tried to tune it with setGcReferences(), but could not;

Hmmm. Not sure the 'garbage collector' routines actually allocate anything. I 
think their purpose is to free up any unused stuff. Maybe you want to set 
freelist limits I use the C interface and so am familiar with these only via 
this interface, https://www.hdfgroup.org/HDF5/doc/RM/RM_H5.html



-       having reached 2GB – the HDF5 create function throws the exception “no 
space available for allocation” (We’re running 64-bit Windows 8 with 16GB of 
RAM)

Are you running on a FAT32 filesystem there? Probably not but doesn't hurt to 
ask

I’d have a few questions at this point:

-       Can we reduce the amount of memory used by the garbage collector? If 
yes - how?

(see above regarding freelist limits)


-       Taking a step back: is the HDF5 API designed to handle thousands of 
files in practice?

We often use it with this number of files. But, generally, the application has 
only a handful open at any one time. If you mean having *all* files open 
simultaneously, I think that could present problems. I've never tested it that 
way.


-       Or would it be better to have a single file with the same number of 
datasets in it? (We’re talking about a few thousand datasets, each with several 
million rows.)

Much better!

Hope that helps.


Thanks for your kind support


________________________________
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be 
privileged. If you are not a named recipient, please notify the sender 
immediately and do not disclose the contents to another person, use it for any 
purpose or store or copy the information in any medium.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to