Hi there,
I just want to point out that PANDAS uses the PyTables format (
http://pandas.pydata.org/pandas-docs/stable/io.html#io-hdf5) which is a
particular structure of storing data that uses the HDF5 as a container but
imposes its own specific layout. This is non-trivial to read in other
programs, particularly MATLAB since it only implements a subset of the HDF5
functionality.

I highly recommend the excellent h5py project (http://www.h5py.org/) which
provides a simple, direct way to read/write datasets in python and access
them from other programs. It provides the convenient create_dataset()
function that probably does what you want.

I use PANDAS for data analysis, but h5py for IO for this reason.

Cheers,
Martijn

On 18 December 2015 at 18:12, Jaworski, Sarah S <[email protected]>
wrote:

> I am writing a python script to write a table to hdf5 file.  Based off
> some quick googling, using the pandas library seemed like an easy way to
> accomplish this.  The code is as follows.  The method is called in a loop,
> sending data to it in sections since all the data cannot be stored in
> memory at the same time (hence, the ‘first_time’ flag):
>
>
>
> def *write_to_hdf*(data, filename, first_time):
>
>     from pandas import DataFrame
>
>     data_frame = DataFrame.from_dict(data)
>
>
>
>     # save to hdf5
>
>     if first_time == True:
>
>         data_frame.to_hdf(filename, *'data'*, mode=*'w'*, format=*'table'*,
> append=True)
>
>     else:
>
>         data_frame.to_hdf(filename, *'data'*, append=True)
>
>
>
>     # allow data frame to be garbage collected
>
>     del data_frame
>
>
>
> This seems to work fine.  However, upon inspecting the HDF5 file, I saw
> some things that I didn’t expect.  Having never worked with HDF5 tables
> before, I expected to see a dataset named ‘data’ with a compound type that
> contained a member for each each field in my data frame.  My example table
> has 13,403 rows and three columns:  TIME, $EP, and $SYSID.  The HDF5 file
> looks like this when using h5disp from Matlab:
>
>
>
> >> h5disp('C:\Data\hdf-export.h5')
>
> HDF5 hdf-export.h5
>
> Group '/'
>
>     Attributes:
>
>         'TITLE':  ''
>
>         'CLASS':  'GROUP'
>
>         'VERSION':  '1.0'
>
>         'PYTABLES_FORMAT_VERSION':  '2.1'
>
>     Group '/data'
>
>         Attributes:
>
>             'TITLE':  ''
>
>             'CLASS':  'GROUP'
>
>             'VERSION':  '1.0'
>
>             'pandas_type':  'frame_table'
>
>             'pandas_version':  '0.10.1'
>
>             'table_type':  'appendable_frame'
>
>             'index_cols':  '(lp1
>
> (I0
>
> S'index'
>
> p2
>
> tp3
>
> a.'
>
>             'values_cols':  '(lp1
>
> S'values_block_0'
>
> p2
>
> aS'values_block_1'
>
> p3
>
> a.'
>
>             'non_index_axes':  '(lp1
>
> (I1
>
> (lp2
>
> S'$EP'
>
> p3
>
> aS'$SYSID'
>
> p4
>
> aS'TIME'
>
> p5
>
> atp6
>
> a.'
>
>             'data_columns':  '(lp1
>
> .'
>
>             'nan_rep':  'nan'
>
>             'encoding':  'N.'
>
>             'levels':  1
>
>             'info':  '(dp1
>
> I1
>
> (dp2
>
> S'type'
>
> p3
>
> S'Index'
>
> p4
>
> sS'names'
>
> p5
>
> (lp6
>
> NassS'index'
>
> p7
>
> (dp8
>
> s.'
>
>         Dataset 'table'
>
>             Size:  13403
>
>             MaxSize:  Inf
>
>             Datatype:   H5T_COMPOUND
>
>                 Member 'index':  H5T_STD_I64LE (int64)
>
>                 Member 'values_block_0':  H5T_ARRAY
>
>                     Size: 1
>
>                     Base Type:  H5T_IEEE_F64LE (double)
>
>                 Member 'values_block_1':  H5T_ARRAY
>
>                     Size: 2
>
>                     Base Type:  H5T_STD_I64LE (int64)
>
>             ChunkSize:  2048
>
>             Filters:  none
>
>             Attributes:
>
>                 'CLASS':  'TABLE'
>
>                 'VERSION':  '2.7'
>
>                 'TITLE':  ''
>
>                 'FIELD_0_NAME':  'index'
>
>                 'FIELD_1_NAME':  'values_block_0'
>
>                 'FIELD_2_NAME':  'values_block_1'
>
>                 'FIELD_0_FILL':  0
>
>                 'FIELD_1_FILL':  0.000000
>
>                 'FIELD_2_FILL':  0
>
>                 'index_kind':  'integer'
>
>                 'values_block_0_kind':  '(lp1
>
> S'TIME'
>
> p2
>
> a.'
>
>                 'values_block_0_dtype':  'float64'
>
>                 'values_block_1_kind':  '(lp1
>
> S'$EP'
>
> p2
>
> aS'$SYSID'
>
> p3
>
> a.'
>
>                 'values_block_1_dtype':  'int64'
>
>                 'NROWS':  13403
>
>         Group '/data/_i_table'
>
>             Attributes:
>
>                 'TITLE':  'Indexes container for table /data/table'
>
>                 'CLASS':  'TINDEX'
>
>                 'VERSION':  '1.0'
>
>             Group '/data/_i_table/index'
>
>                 Attributes:
>
>                     'TITLE':  'Index for index column'
>
>                     'CLASS':  'INDEX'
>
>                    'VERSION':  '2.1'
>
>                     'FILTERS':  65793
>
>                     'superblocksize':  262144
>
>                     'blocksize':  131072
>
>                     'slicesize':  131072
>
>                     'chunksize':  1024
>
>                     'optlevel':  6
>
>                     'reduction':  1
>
>                     'DIRTY':  0
>
>                 Dataset 'abounds'
>
>                     Size:  0
>
>                     MaxSize:  Inf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  8192
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'EARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Start bounds'
>
>                         'EXTDIM':  0
>
>                 Dataset 'bounds'
>
>                     Size:  127x0
>
>                     MaxSize:  127xInf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  127x1
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'CACHEARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Boundary Values'
>
>                         'EXTDIM':  0
>
>                 Dataset 'indices'
>
>                     Size:  131072x0
>
>                     MaxSize:  131072xInf
>
>                     Datatype:   H5T_STD_U32LE (uint32)
>
>                     ChunkSize:  1024x1
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'INDEXARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Number of chunk in table'
>
>                         'EXTDIM':  0
>
>                 Dataset 'indicesLR'
>
>                     Size:  131072
>
>                     MaxSize:  131072
>
>                     Datatype:   H5T_STD_U32LE (uint32)
>
>                     ChunkSize:  1024
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'LASTROWARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Last Row indices'
>
>                         'nelements':  13403
>
>                 Dataset 'mbounds'
>
>                     Size:  0
>
>                     MaxSize:  Inf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  8192
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'EARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Median bounds'
>
>                         'EXTDIM':  0
>
>                 Dataset 'mranges'
>
>                     Size:  0
>
>                     MaxSize:  Inf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  8192
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'EARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Median ranges'
>
>                         'EXTDIM':  0
>
>                 Dataset 'ranges'
>
>                     Size:  2x0
>
>                     MaxSize:  2xInf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  2x4096
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'CACHEARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Range Values'
>
>                         'EXTDIM':  0
>
>                 Dataset 'sorted'
>
>                     Size:  131072x0
>
>                     MaxSize:  131072xInf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  1024x1
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'INDEXARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Sorted Values'
>
>                         'EXTDIM':  0
>
>                 Dataset 'sortedLR'
>
>                     Size:  131201
>
>                     MaxSize:  131201
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  1024
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'LASTROWARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'Last Row sorted values + bounds'
>
>                         'nelements':  13403
>
>                 Dataset 'zbounds'
>
>                     Size:  0
>
>                     MaxSize:  Inf
>
>                     Datatype:   H5T_STD_I64LE (int64)
>
>                     ChunkSize:  8192
>
>                     Filters:  shuffle, deflate(1)
>
>                     Attributes:
>
>                         'CLASS':  'EARRAY'
>
>                         'VERSION':  '1.1'
>
>                         'TITLE':  'End bounds'
>
>                         'EXTDIM':  0
>
>
>
> I see that /data/table has two arrays that hold my data values.  However,
> they are not named after the fields in my data frame.  I need to be able to
> read the resulting HDF5 file from Matlab.  I also need to be able to use
> the HDF5 Java object API to read this data for a separate application that
> I maintain.  I don’t see a way to even figure out what the fieldnames in my
> original dataset are.  I see them embedded in some attributes within a
> larger string, but nothing straightforward.  In the HDF C API, I see H5TB
> methods like H5TBread_fields_name, which seem like they would do this.  I
> don’t see an equivalent API in Java.  I also don’t see anything in Matlab’s
> documentation.  (I’m using Matlab R2012b.)
>
>
>
> Any help in trying to read this table from the HDF5 correctly in Matlab
> and/or from the Java object API is appreciated.
>
>
>
> Thank you.
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to