Re: [Hdf-forum] Dynamically allocated multidimensional arrays C++

Miller, Mark C. Mon, 09 May 2016 10:29:44 -0700

There is a third and maybe a fourth way to handle this…

3. Do the dynamic multi-Dim array as you normally would but when you turn 
around and write the beast to HDF5, unravel it into a temporary buffer just 
before H5Dwrite. Do the opposite just after H5Dread. That involves a data copy 
but can work just fine if the arrays are small. Its just a bit more work to 
write and read. This is similar to the previous respondents suggestion to "do 
the indexing yourself" except you don't change anything in *your* client code 
except the places where you interface to HDF5.


4. You may be able to do something more elegant using either HDF5 datatypes and 
custom type conversion routines or HDF5 filters. My first thought is a "filter" 
but it would be a bit of a kluge too. You define a custom filter (see 
https://www.hdfgroup.org/HDF5/doc/RM/RM_H5Z.html#Compression-Register) and you 
*ensure* that the chunk size you specify for the filter is large enough to at 
least cover the top-level array of pointers in your arrays. That might be a 
somewhat large chunk size but so what. Then, *assuming* HDF5 always sends 
chunks to the filter moving through memory starting with the pointer it was 
handed in the H5Dwrite call, upon the first entry to your filter, you would 
"see" the top-level set of pointers. You would have to cache those away for 
safe keeping inside the filter somehow. Then with each successive chunk request 
that comes through the filter, you would use the cached pointer structure to go 
find the actual chunk being processed in memory and then turn around and pass 
that chunk at the output of the filter. This is kinda sorta like a "streaming 
copy". You don't ever have copied more than a single chunks worth of your array 
at any moment so its better than #3 (which is a full copy of the array), but 
its also a bit klugey. And, I haven't given any though to how you would do the 
read back either. I'm just assuming its possible. If you go the datatype route, 
then you would define a custom datatype for (probably each instance of such an 
object) and then also register your own data conversion routine (see 
https://www.hdfgroup.org/HDF5/doc/RM/RM_H5T.html#Datatype-Register)  for it. It 
would work somewhat similarly I think and might even be a better way to go than 
a filter. However, I've never worked with that aspect of HDF5.

Hope that helps.

Mark



From: Hdf-forum 
<[email protected]<mailto:[email protected]>>
 on behalf of huebbe 
<[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Monday, May 9, 2016 6:13 AM
To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Hdf-forum] Dynamically allocated multidimensional arrays C++

Of course, you get garbage output: You are storing the array of pointers 
instead of the data,
along with whatever garbage happens to be after those pointers in memory.

Trouble is, C++ simply can't do true multidimensional arrays of dynamic size.
It's not part of the language. So you basically have two options:

1. Do the indexing yourself. Declare your multidimensional array as a 1D array, 
and access its elements
    via `data[i*dims[1] + j]`. This is a nuisance, but still feasible.

2. Use C. C99 allows true multidimensional arrays of dynamic size. So, in C, 
you can just write

        double (*data)[dims[1]] = malloc(dims[0] * sizeof(*data));
        for ( size_t i = 0; i < dims[0]; ++i )
            for ( size_t j = 0; j < dims[1]; ++j )
                data[i][j] = i + j;

    This will layout your data in memory the way HDF5 expects it, but it's not 
legal C++ code of any standard.

Of course, you can also use your pointer array, and read/write the data line by 
line. Or you can allocate
your data as a 1D array and alias it with a pointer array to be able to access 
it via `data[i][j]`.
But either way, it gets dirty.


Cheers,
Nathanael Hübbe



On 05/06/2016 10:29 PM, Steven Walton wrote:
So I am noticing some interesting behavior and is wondering if there is a way 
around this.
I am able so assign a rank 1 array dynamically and write this to an hdf5 
filetype but I do not seem to be able to do with with higher order arrays. I 
would like to be able to write a PPx array to h5 and retain the data integrity. 
More specifically I am trying to create a easy to use vector to array library 
<https://github.com/stevenwalton/H5Easy> that can handle multidimensional data 
(works with rank 1).
Let me give some examples. I will also show the typenames of the arrays.
Works:
double *a = new double[numPts]; // typename: Pd
double a[numPts]; // typename A#pts_d
double a[num1][num2]; typename:Anum1_Anum2_d
What doesn't work:
double **a = new double*[num1];
for ( size_t i = 0; i < num1; ++i )
    a[i] = new double[num2];
// typename PPd
Testing the saved arrays with h5dump (and loading and reading directly) I find 
that if I have typename PPx (not necessarily double) I get garbage stored. Here 
is an example code and output from h5dump showing the behavior.
------------------------------------------------------------
compiled with h5c++ -std=c++11
------------------------------------------------------------
#include "H5Cpp.h"
using namespace H5;
#define FILE "multi.h5"
int main()
{
   hsize_t dims[2];
   herr_t status;
   H5File file(FILE, H5F_ACC_TRUNC);
   dims[0] = 4;
   dims[1] = 6;
   double **data = new double*[dims[0]];
   for ( size_t i = 0; i < dims[0]; ++i )
     data[i] = new double[dims[1]];
   for ( size_t i = 0; i < dims[0]; ++i )
     for ( size_t j = 0; j < dims[1]; ++j )
       data[i][j] = i + j;
   DataSpace dataspace = DataSpace(2,dims);
   DataSet dataset( file.createDataSet( "test", PredType::IEEE_F64LE, dataspace 
) );
   dataset.write(data, PredType::IEEE_F64LE);
   dataset.close();
   dataspace.close();
   file.close();

   return 0;
}
------------------------------------------------------------
h5dump
------------------------------------------------------------
HDF5 "multi.h5" {
GROUP "/" {
    DATASET "test" {
       DATATYPE  H5T_IEEE_F64LE
       DATASPACE  SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
       DATA {
       (0,0): 1.86018e-316, 1.86018e-316, 1.86018e-316, 1.86019e-316, 0,
       (0,5): 3.21143e-322,
       (1,0): 0, 1, 2, 3, 4, 5,
       (2,0): 0, 3.21143e-322, 1, 2, 3, 4,
       (3,0): 5, 6, 0, 3.21143e-322, 2, 3
       }
    }
}
}
------------------------------------------------------------------
As can be seen the (0,0) set is absolute garbage (except the last character 
which is the first number of the actual array), (0,5) is out of bounds,  and 
has garbage data. (1,0) has always contained real data (though it should be 
located at (0,0)). So this seems like some addressing problem.
Is this a bug in the h5 libraries that allows me to read and write Pd data as 
well as Ax0_...Axn_t data but not P...Pt data? Or is this for some reason 
intentional? As using new is a fairly standard way to assign arrays, making 
P...Pt type data common, I have a hard time seeing this as intentional. In the 
mean time is anyone aware of a workaround to this? The data I am taking in will 
be dynamically allocated so I do not see a way to get Ax_... type data.
Thank you,
Steven
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
Please be aware that the enemies of your civil rights and your freedom
are on CC of all unencrypted communication. Protect yourself.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Dynamically allocated multidimensional arrays C++

Reply via email to