I am writing a python script to write a table to hdf5 file.  Based off some 
quick googling, using the pandas library seemed like an easy way to accomplish 
this.  The code is as follows.  The method is called in a loop, sending data to 
it in sections since all the data cannot be stored in memory at the same time 
(hence, the 'first_time' flag):

def write_to_hdf(data, filename, first_time):
    from pandas import DataFrame
    data_frame = DataFrame.from_dict(data)

    # save to hdf5
    if first_time == True:
        data_frame.to_hdf(filename, 'data', mode='w', format='table', 
append=True)
    else:
        data_frame.to_hdf(filename, 'data', append=True)

    # allow data frame to be garbage collected
    del data_frame

This seems to work fine.  However, upon inspecting the HDF5 file, I saw some 
things that I didn't expect.  Having never worked with HDF5 tables before, I 
expected to see a dataset named 'data' with a compound type that contained a 
member for each each field in my data frame.  My example table has 13,403 rows 
and three columns:  TIME, $EP, and $SYSID.  The HDF5 file looks like this when 
using h5disp from Matlab:

>> h5disp('C:\Data\hdf-export.h5')
HDF5 hdf-export.h5
Group '/'
    Attributes:
        'TITLE':  ''
        'CLASS':  'GROUP'
        'VERSION':  '1.0'
        'PYTABLES_FORMAT_VERSION':  '2.1'
    Group '/data'
        Attributes:
            'TITLE':  ''
            'CLASS':  'GROUP'
            'VERSION':  '1.0'
            'pandas_type':  'frame_table'
            'pandas_version':  '0.10.1'
            'table_type':  'appendable_frame'
            'index_cols':  '(lp1
(I0
S'index'
p2
tp3
a.'
            'values_cols':  '(lp1
S'values_block_0'
p2
aS'values_block_1'
p3
a.'
            'non_index_axes':  '(lp1
(I1
(lp2
S'$EP'
p3
aS'$SYSID'
p4
aS'TIME'
p5
atp6
a.'
            'data_columns':  '(lp1
.'
            'nan_rep':  'nan'
            'encoding':  'N.'
            'levels':  1
            'info':  '(dp1
I1
(dp2
S'type'
p3
S'Index'
p4
sS'names'
p5
(lp6
NassS'index'
p7
(dp8
s.'
        Dataset 'table'
            Size:  13403
            MaxSize:  Inf
            Datatype:   H5T_COMPOUND
                Member 'index':  H5T_STD_I64LE (int64)
                Member 'values_block_0':  H5T_ARRAY
                    Size: 1
                    Base Type:  H5T_IEEE_F64LE (double)
                Member 'values_block_1':  H5T_ARRAY
                    Size: 2
                    Base Type:  H5T_STD_I64LE (int64)
            ChunkSize:  2048
            Filters:  none
            Attributes:
                'CLASS':  'TABLE'
                'VERSION':  '2.7'
                'TITLE':  ''
                'FIELD_0_NAME':  'index'
                'FIELD_1_NAME':  'values_block_0'
                'FIELD_2_NAME':  'values_block_1'
                'FIELD_0_FILL':  0
                'FIELD_1_FILL':  0.000000
                'FIELD_2_FILL':  0
                'index_kind':  'integer'
                'values_block_0_kind':  '(lp1
S'TIME'
p2
a.'
                'values_block_0_dtype':  'float64'
                'values_block_1_kind':  '(lp1
S'$EP'
p2
aS'$SYSID'
p3
a.'
                'values_block_1_dtype':  'int64'
                'NROWS':  13403
        Group '/data/_i_table'
            Attributes:
                'TITLE':  'Indexes container for table /data/table'
                'CLASS':  'TINDEX'
                'VERSION':  '1.0'
            Group '/data/_i_table/index'
                Attributes:
                    'TITLE':  'Index for index column'
                    'CLASS':  'INDEX'
                   'VERSION':  '2.1'
                    'FILTERS':  65793
                    'superblocksize':  262144
                    'blocksize':  131072
                    'slicesize':  131072
                    'chunksize':  1024
                    'optlevel':  6
                    'reduction':  1
                    'DIRTY':  0
                Dataset 'abounds'
                    Size:  0
                    MaxSize:  Inf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  8192
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'EARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Start bounds'
                        'EXTDIM':  0
                Dataset 'bounds'
                    Size:  127x0
                    MaxSize:  127xInf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  127x1
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'CACHEARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Boundary Values'
                        'EXTDIM':  0
                Dataset 'indices'
                    Size:  131072x0
                    MaxSize:  131072xInf
                    Datatype:   H5T_STD_U32LE (uint32)
                    ChunkSize:  1024x1
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'INDEXARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Number of chunk in table'
                        'EXTDIM':  0
                Dataset 'indicesLR'
                    Size:  131072
                    MaxSize:  131072
                    Datatype:   H5T_STD_U32LE (uint32)
                    ChunkSize:  1024
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'LASTROWARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Last Row indices'
                        'nelements':  13403
                Dataset 'mbounds'
                    Size:  0
                    MaxSize:  Inf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  8192
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'EARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Median bounds'
                        'EXTDIM':  0
                Dataset 'mranges'
                    Size:  0
                    MaxSize:  Inf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  8192
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'EARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Median ranges'
                        'EXTDIM':  0
                Dataset 'ranges'
                    Size:  2x0
                    MaxSize:  2xInf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  2x4096
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'CACHEARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Range Values'
                        'EXTDIM':  0
                Dataset 'sorted'
                    Size:  131072x0
                    MaxSize:  131072xInf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  1024x1
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'INDEXARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Sorted Values'
                        'EXTDIM':  0
                Dataset 'sortedLR'
                    Size:  131201
                    MaxSize:  131201
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  1024
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'LASTROWARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'Last Row sorted values + bounds'
                        'nelements':  13403
                Dataset 'zbounds'
                    Size:  0
                    MaxSize:  Inf
                    Datatype:   H5T_STD_I64LE (int64)
                    ChunkSize:  8192
                    Filters:  shuffle, deflate(1)
                    Attributes:
                        'CLASS':  'EARRAY'
                        'VERSION':  '1.1'
                        'TITLE':  'End bounds'
                        'EXTDIM':  0

I see that /data/table has two arrays that hold my data values.  However, they 
are not named after the fields in my data frame.  I need to be able to read the 
resulting HDF5 file from Matlab.  I also need to be able to use the HDF5 Java 
object API to read this data for a separate application that I maintain.  I 
don't see a way to even figure out what the fieldnames in my original dataset 
are.  I see them embedded in some attributes within a larger string, but 
nothing straightforward.  In the HDF C API, I see H5TB methods like 
H5TBread_fields_name, which seem like they would do this.  I don't see an 
equivalent API in Java.  I also don't see anything in Matlab's documentation.  
(I'm using Matlab R2012b.)

Any help in trying to read this table from the HDF5 correctly in Matlab and/or 
from the Java object API is appreciated.

Thank you.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to