Hi Sarah, Pandas uses the so-called 'fixed' format ( http://pandas.pydata.org/pandas-docs/stable/io.html#fixed-format) by default, which, although HDF5, it creates a quite complex structure indeed. I suggest you to try the 'table' format ( http://pandas.pydata.org/pandas-docs/stable/io.html#table-format) instead. Also, you won't need PyTables indexes (a way to accelerate queries in HDF5 tables) for MATLAB, so better disable them.
Here it is an example that creates a pure HDF5 table (compound type dataset) that you should be able to read with MATLAB (apparently compound datatypes are supported there: http://es.mathworks.com/help/matlab/import_export/importing-hierarchical-data-format-hdf5-files.html ): """# prova.py file import pandas as pd pd.set_option('io.hdf.default_format', 'table') with pd.HDFStore('store3.h5', index=False) as store: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B']) store.append('df', df, index=False) print(repr(store)) """ $ python prova.py <class 'pandas.io.pytables.HDFStore'> File path: store3.h5 /df frame_table (typ->appendable,nrows->4,ncols->2,indexers->[index]) $ h5ls -rd store3.h5 / Group /df Group /df/table Dataset {2/Inf} Data: (0) {0, [1,2]}, {1, [3,4]} Hope this helps, Francesc 2015-12-18 18:12 GMT+01:00 Jaworski, Sarah S <[email protected]>: > I am writing a python script to write a table to hdf5 file. Based off > some quick googling, using the pandas library seemed like an easy way to > accomplish this. The code is as follows. The method is called in a loop, > sending data to it in sections since all the data cannot be stored in > memory at the same time (hence, the ‘first_time’ flag): > > > > def *write_to_hdf*(data, filename, first_time): > > from pandas import DataFrame > > data_frame = DataFrame.from_dict(data) > > > > # save to hdf5 > > if first_time == True: > > data_frame.to_hdf(filename, *'data'*, mode=*'w'*, format=*'table'*, > append=True) > > else: > > data_frame.to_hdf(filename, *'data'*, append=True) > > > > # allow data frame to be garbage collected > > del data_frame > > > > This seems to work fine. However, upon inspecting the HDF5 file, I saw > some things that I didn’t expect. Having never worked with HDF5 tables > before, I expected to see a dataset named ‘data’ with a compound type that > contained a member for each each field in my data frame. My example table > has 13,403 rows and three columns: TIME, $EP, and $SYSID. The HDF5 file > looks like this when using h5disp from Matlab: > > > > >> h5disp('C:\Data\hdf-export.h5') > > HDF5 hdf-export.h5 > > Group '/' > > Attributes: > > 'TITLE': '' > > 'CLASS': 'GROUP' > > 'VERSION': '1.0' > > 'PYTABLES_FORMAT_VERSION': '2.1' > > Group '/data' > > Attributes: > > 'TITLE': '' > > 'CLASS': 'GROUP' > > 'VERSION': '1.0' > > 'pandas_type': 'frame_table' > > 'pandas_version': '0.10.1' > > 'table_type': 'appendable_frame' > > 'index_cols': '(lp1 > > (I0 > > S'index' > > p2 > > tp3 > > a.' > > 'values_cols': '(lp1 > > S'values_block_0' > > p2 > > aS'values_block_1' > > p3 > > a.' > > 'non_index_axes': '(lp1 > > (I1 > > (lp2 > > S'$EP' > > p3 > > aS'$SYSID' > > p4 > > aS'TIME' > > p5 > > atp6 > > a.' > > 'data_columns': '(lp1 > > .' > > 'nan_rep': 'nan' > > 'encoding': 'N.' > > 'levels': 1 > > 'info': '(dp1 > > I1 > > (dp2 > > S'type' > > p3 > > S'Index' > > p4 > > sS'names' > > p5 > > (lp6 > > NassS'index' > > p7 > > (dp8 > > s.' > > Dataset 'table' > > Size: 13403 > > MaxSize: Inf > > Datatype: H5T_COMPOUND > > Member 'index': H5T_STD_I64LE (int64) > > Member 'values_block_0': H5T_ARRAY > > Size: 1 > > Base Type: H5T_IEEE_F64LE (double) > > Member 'values_block_1': H5T_ARRAY > > Size: 2 > > Base Type: H5T_STD_I64LE (int64) > > ChunkSize: 2048 > > Filters: none > > Attributes: > > 'CLASS': 'TABLE' > > 'VERSION': '2.7' > > 'TITLE': '' > > 'FIELD_0_NAME': 'index' > > 'FIELD_1_NAME': 'values_block_0' > > 'FIELD_2_NAME': 'values_block_1' > > 'FIELD_0_FILL': 0 > > 'FIELD_1_FILL': 0.000000 > > 'FIELD_2_FILL': 0 > > 'index_kind': 'integer' > > 'values_block_0_kind': '(lp1 > > S'TIME' > > p2 > > a.' > > 'values_block_0_dtype': 'float64' > > 'values_block_1_kind': '(lp1 > > S'$EP' > > p2 > > aS'$SYSID' > > p3 > > a.' > > 'values_block_1_dtype': 'int64' > > 'NROWS': 13403 > > Group '/data/_i_table' > > Attributes: > > 'TITLE': 'Indexes container for table /data/table' > > 'CLASS': 'TINDEX' > > 'VERSION': '1.0' > > Group '/data/_i_table/index' > > Attributes: > > 'TITLE': 'Index for index column' > > 'CLASS': 'INDEX' > > 'VERSION': '2.1' > > 'FILTERS': 65793 > > 'superblocksize': 262144 > > 'blocksize': 131072 > > 'slicesize': 131072 > > 'chunksize': 1024 > > 'optlevel': 6 > > 'reduction': 1 > > 'DIRTY': 0 > > Dataset 'abounds' > > Size: 0 > > MaxSize: Inf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 8192 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'EARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Start bounds' > > 'EXTDIM': 0 > > Dataset 'bounds' > > Size: 127x0 > > MaxSize: 127xInf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 127x1 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'CACHEARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Boundary Values' > > 'EXTDIM': 0 > > Dataset 'indices' > > Size: 131072x0 > > MaxSize: 131072xInf > > Datatype: H5T_STD_U32LE (uint32) > > ChunkSize: 1024x1 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'INDEXARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Number of chunk in table' > > 'EXTDIM': 0 > > Dataset 'indicesLR' > > Size: 131072 > > MaxSize: 131072 > > Datatype: H5T_STD_U32LE (uint32) > > ChunkSize: 1024 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'LASTROWARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Last Row indices' > > 'nelements': 13403 > > Dataset 'mbounds' > > Size: 0 > > MaxSize: Inf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 8192 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'EARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Median bounds' > > 'EXTDIM': 0 > > Dataset 'mranges' > > Size: 0 > > MaxSize: Inf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 8192 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'EARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Median ranges' > > 'EXTDIM': 0 > > Dataset 'ranges' > > Size: 2x0 > > MaxSize: 2xInf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 2x4096 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'CACHEARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Range Values' > > 'EXTDIM': 0 > > Dataset 'sorted' > > Size: 131072x0 > > MaxSize: 131072xInf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 1024x1 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'INDEXARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Sorted Values' > > 'EXTDIM': 0 > > Dataset 'sortedLR' > > Size: 131201 > > MaxSize: 131201 > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 1024 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'LASTROWARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'Last Row sorted values + bounds' > > 'nelements': 13403 > > Dataset 'zbounds' > > Size: 0 > > MaxSize: Inf > > Datatype: H5T_STD_I64LE (int64) > > ChunkSize: 8192 > > Filters: shuffle, deflate(1) > > Attributes: > > 'CLASS': 'EARRAY' > > 'VERSION': '1.1' > > 'TITLE': 'End bounds' > > 'EXTDIM': 0 > > > > I see that /data/table has two arrays that hold my data values. However, > they are not named after the fields in my data frame. I need to be able to > read the resulting HDF5 file from Matlab. I also need to be able to use > the HDF5 Java object API to read this data for a separate application that > I maintain. I don’t see a way to even figure out what the fieldnames in my > original dataset are. I see them embedded in some attributes within a > larger string, but nothing straightforward. In the HDF C API, I see H5TB > methods like H5TBread_fields_name, which seem like they would do this. I > don’t see an equivalent API in Java. I also don’t see anything in Matlab’s > documentation. (I’m using Matlab R2012b.) > > > > Any help in trying to read this table from the HDF5 correctly in Matlab > and/or from the Java object API is appreciated. > > > > Thank you. > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 > -- Francesc Alted
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
