Re: [Hdf-forum] Incremental writing of compound dataset slow

David A. Schneider Mon, 20 Jul 2015 10:07:51 -0700

Hi Daniel,

I'm not sure what's going on. You set the chunk size to 1,000,000elements, that is one chunk is 1 million of the structs, the size ofeach chunk in bytes depends on the compound size, which can vary due tothe variable len strings. Still though, since you are writing 30,000compounds, I'd think you are only writing one chunk, and you say it isjust as slow to write, even though it takes up 10 times as much diskspace? Hopefully you'll get some other ideas from the forum.


best,

David

On 07/20/15 09:57, Daniel Rimmelspacher wrote:

Hi David,

thanks for the quick reply!

Tweaking chunk size does not do the trick:

Setting

hsize_t chunkDims[1] = {1000000};

will increase the file size, but not the speed, i.e. it remains almostthe same with a 150MB instead of 15MB file.

Regarding the data type in my struct, each element is a variablelength string.


Best regards,

Daniel



> Date: Mon, 20 Jul 2015 09:13:41 -0700
> From: [email protected]
> To: [email protected]
> Subject: Re: [Hdf-forum] Incremental writing of compound dataset slow
>
> Hi Daniel,
>
> It looks like you are writing chunks of size 100, where each struct is
> maybe 40 bytes? I'm not sure what all the types are in the struct - but
> if that is the case each chunk is about 4k. It is my understanding that

> each chunk equates to one system write to disk, and these areexpensive.

> A good rule of thumb is to target 1MB chunks.
>
> best,
>
> David
> software engineer at SLAC
>
> On 07/19/15 06:26, Daniel Rimmelspacher wrote:
> > Dear hdf-forum,
> >
> > I am trying to write compound data to an extendible hdf-dataset. For

> > the code snippet below, I am writing ~30000 compound itemsone-by-one,

> > resulting in an approximately 15MB h5-file.
> >
> > For dumping this amount of data the hdf library requires roughly 15
> > seconds. This seems a little bit long to me. My guess is that
> > requesting the proper hyperslab for each new item wastes most of the
> > time.
> >
> > Here, however, I am struggling a little bit, since I don't manage to
> > find out more about this.
> >

> > I'd appreciate if someone would have a quick look at the codebelow in

> > order to give me a hint.
> >
> > Thanks and regards,
> >
> > Daniel
> >
> > ////////////////////////////////////////////////////////////////////
> > // Header: definition of struct type characteristic_t
> > //////////////////////////////////////////////// ////
> > ...
> >
> > /////////////////////////////////////////////////////////////////////
> > // This section initializes the dataset (once) for incremental read
> > /////////////////////////////////////////////////////////////////////
> > // initialize variable length string type
> > constStrType vlst(PredType::C_S1, H5T_VARIABLE);
> >
> > // Create memory space for compound datatype
> > memspace = CompType(sizeof(characteristic_t));
> > H5Tinsert(memspace.getId(), "Name", HOFFSET(characteristic_t ,
> > name), vlst.getId());
> > H5Tinsert(memspace.getId(), "LongIdentifier",
> > HOFFSET(characteristic_t , longId), vlst.getId());
> > H5Tinsert(memspace.getId(), "Type", HOFFSET(characteristic_t
> > , type), vlst.getId());
> > H5Tinsert(memspace.getId(), "Address", HOFFSET(characteristic_t ,
> > address), vlst.getId());
> > H5Tinsert(memspace.getId(), "Deposit", HOFFSET(characteristic_t ,
> > deposit), vlst.getId());
> > H5Tinsert(memspace.getId(), "MaxDiff", HOFFSET(characteristic_t ,
> > maxDiff), vlst.getId());
> > H5Tinsert(memspace.getId(), "Conversion",
> > HOFFSET(characteristic_t , conversion), vlst.getId());
> > H5Tinsert(memspace.getId(), "LowerLimit",
> > HOFFSET(characteristic_t , lowerLimit), vlst.getId());
> > H5Tinsert(memspace.getId(), "UpperLimit",
> > HOFFSET(characteristic_t , upperLimit), vlst.getId());
> >
> > // Prepare data set
> > dims[0] = 0; // Initial size
> > hsize_t rank = 1;
> > // data will be
> > alligned in array style
> > hsize_t maxDims[1] = {H5S_UNLIMITED};
> > // dataset will be
> > extendible
> > hsize_t chunkDims[1] = {100}; // some random chunksize
> > DataSpace *dataspace = newDataSpace (rank, dims,
> > maxDims); // set dataspace for dataset
> >
> > // Modify dataset creation property to enable chunking
> > DSetCreatPropList prop;
> > prop.setChunk(rank, chunkDims);
> >
> > // Create the chunked dataset. Note the use of pointer.

> > charData = file.createDataSet( "Characteristic", memspace,*dataspace,

> > prop);
> >
> > // Init helper
> > hsize_t chunk[1] = {1};
> > chunkSpace = DataSpace(1, chunk, NULL);
> > filespace = DataSpace(charData.getSpace());
> >
> >

> >//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

> > // This section will be called repeatadly in order to write the
> > compound items iteratively

> >/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

> > // Create the new item.
> > characteristic_t s1[1];
> > s1[0].name = name;
> > s1[0].longId = Id;
> > s1[0].type = type;
> > s1[0].address = address;
> > s1[0].deposit = deposit;
> > s1[0].maxDiff = maxDiff;
> > s1[0].conversion = conversion;
> > s1[0].lowerLimit = lowerLimit;
> > s1[0].upperLimit = upperLimit;
> >
> > // Extend dataset
> > dims[0]++;
> > charData.extend(dims);
> >
> > // Compute new dims
> > hsize_t chunk[1] = {1};
> > hsize_t start[1] = {0};
> > start[0] = dims[0]-1;
> >
> > // Select a hyperslab in extended portion of the dataset.
> > filespace = charData.getSpace();
> > filespace.selectHyperslab(H5S_SELECT_SET, chunk, start);
> >
> > // Write data to the extended portion of the dataset.
> > charData.write(s1, memspace, chunkSpace, filespace);
> >
> >
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]

> >http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

> > Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Incremental writing of compound dataset slow

Reply via email to