Hi David,
thanks for the quick reply!
Tweaking chunk size does not do the trick:
Setting
hsize_t chunkDims[1] = {1000000};
will increase the file size, but not the speed, i.e. it remains almost
the same with a 150MB instead of 15MB file.
Regarding the data type in my struct, each element is a variable
length string.
Best regards,
Daniel
> Date: Mon, 20 Jul 2015 09:13:41 -0700
> From: [email protected]
> To: [email protected]
> Subject: Re: [Hdf-forum] Incremental writing of compound dataset slow
>
> Hi Daniel,
>
> It looks like you are writing chunks of size 100, where each struct is
> maybe 40 bytes? I'm not sure what all the types are in the struct - but
> if that is the case each chunk is about 4k. It is my understanding that
> each chunk equates to one system write to disk, and these are
expensive.
> A good rule of thumb is to target 1MB chunks.
>
> best,
>
> David
> software engineer at SLAC
>
> On 07/19/15 06:26, Daniel Rimmelspacher wrote:
> > Dear hdf-forum,
> >
> > I am trying to write compound data to an extendible hdf-dataset. For
> > the code snippet below, I am writing ~30000 compound items
one-by-one,
> > resulting in an approximately 15MB h5-file.
> >
> > For dumping this amount of data the hdf library requires roughly 15
> > seconds. This seems a little bit long to me. My guess is that
> > requesting the proper hyperslab for each new item wastes most of the
> > time.
> >
> > Here, however, I am struggling a little bit, since I don't manage to
> > find out more about this.
> >
> > I'd appreciate if someone would have a quick look at the code
below in
> > order to give me a hint.
> >
> > Thanks and regards,
> >
> > Daniel
> >
> > ////////////////////////////////////////////////////////////////////
> > // Header: definition of struct type characteristic_t
> > //////////////////////////////////////////////// ////
> > ...
> >
> > /////////////////////////////////////////////////////////////////////
> > // This section initializes the dataset (once) for incremental read
> > /////////////////////////////////////////////////////////////////////
> > // initialize variable length string type
> > constStrType vlst(PredType::C_S1, H5T_VARIABLE);
> >
> > // Create memory space for compound datatype
> > memspace = CompType(sizeof(characteristic_t));
> > H5Tinsert(memspace.getId(), "Name", HOFFSET(characteristic_t ,
> > name), vlst.getId());
> > H5Tinsert(memspace.getId(), "LongIdentifier",
> > HOFFSET(characteristic_t , longId), vlst.getId());
> > H5Tinsert(memspace.getId(), "Type", HOFFSET(characteristic_t
> > , type), vlst.getId());
> > H5Tinsert(memspace.getId(), "Address", HOFFSET(characteristic_t ,
> > address), vlst.getId());
> > H5Tinsert(memspace.getId(), "Deposit", HOFFSET(characteristic_t ,
> > deposit), vlst.getId());
> > H5Tinsert(memspace.getId(), "MaxDiff", HOFFSET(characteristic_t ,
> > maxDiff), vlst.getId());
> > H5Tinsert(memspace.getId(), "Conversion",
> > HOFFSET(characteristic_t , conversion), vlst.getId());
> > H5Tinsert(memspace.getId(), "LowerLimit",
> > HOFFSET(characteristic_t , lowerLimit), vlst.getId());
> > H5Tinsert(memspace.getId(), "UpperLimit",
> > HOFFSET(characteristic_t , upperLimit), vlst.getId());
> >
> > // Prepare data set
> > dims[0] = 0; // Initial size
> > hsize_t rank = 1;
> > // data will be
> > alligned in array style
> > hsize_t maxDims[1] = {H5S_UNLIMITED};
> > // dataset will be
> > extendible
> > hsize_t chunkDims[1] = {100}; // some random chunksize
> > DataSpace *dataspace = newDataSpace (rank, dims,
> > maxDims); // set dataspace for dataset
> >
> > // Modify dataset creation property to enable chunking
> > DSetCreatPropList prop;
> > prop.setChunk(rank, chunkDims);
> >
> > // Create the chunked dataset. Note the use of pointer.
> > charData = file.createDataSet( "Characteristic", memspace,
*dataspace,
> > prop);
> >
> > // Init helper
> > hsize_t chunk[1] = {1};
> > chunkSpace = DataSpace(1, chunk, NULL);
> > filespace = DataSpace(charData.getSpace());
> >
> >
> >
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
> > // This section will be called repeatadly in order to write the
> > compound items iteratively
> >
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
> > // Create the new item.
> > characteristic_t s1[1];
> > s1[0].name = name;
> > s1[0].longId = Id;
> > s1[0].type = type;
> > s1[0].address = address;
> > s1[0].deposit = deposit;
> > s1[0].maxDiff = maxDiff;
> > s1[0].conversion = conversion;
> > s1[0].lowerLimit = lowerLimit;
> > s1[0].upperLimit = upperLimit;
> >
> > // Extend dataset
> > dims[0]++;
> > charData.extend(dims);
> >
> > // Compute new dims
> > hsize_t chunk[1] = {1};
> > hsize_t start[1] = {0};
> > start[0] = dims[0]-1;
> >
> > // Select a hyperslab in extended portion of the dataset.
> > filespace = charData.getSpace();
> > filespace.selectHyperslab(H5S_SELECT_SET, chunk, start);
> >
> > // Write data to the extended portion of the dataset.
> > charData.write(s1, memspace, chunkSpace, filespace);
> >
> >
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> >
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> > Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5