Oops, for that to work, you also need `data_columns=True`. With that, you
don't need to specify the 'table' format either. Here it is a working
example:
"""# prova.py file
import pandas as pd
with pd.HDFStore('store3.h5', mode='w') as store:
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
store.append('df', df, data_columns=True, index=False)
print(repr(store))
"""
$ python prova.py
<class 'pandas.io.pytables.HDFStore'>
File path: store3.h5
/df frame_table
(typ->appendable,nrows->2,ncols->2,indexers->[index],dc->[A,B])
$ h5ls -rd store3.h5
/ Group
/df Group
/df/table Dataset {2/Inf}
Data:
(0) {0, 1, 2}, {1, 3, 4}
Cheers,
Francesc
2015-12-21 10:16 GMT+01:00 Francesc Alted <[email protected]>:
> Hi Sarah,
>
> Pandas uses the so-called 'fixed' format (
> http://pandas.pydata.org/pandas-docs/stable/io.html#fixed-format) by
> default, which, although HDF5, it creates a quite complex structure
> indeed. I suggest you to try the 'table' format (
> http://pandas.pydata.org/pandas-docs/stable/io.html#table-format)
> instead. Also, you won't need PyTables indexes (a way to accelerate
> queries in HDF5 tables) for MATLAB, so better disable them.
>
> Here it is an example that creates a pure HDF5 table (compound type
> dataset) that you should be able to read with MATLAB (apparently compound
> datatypes are supported there:
> http://es.mathworks.com/help/matlab/import_export/importing-hierarchical-data-format-hdf5-files.html
> ):
>
> """# prova.py file
> import pandas as pd
>
> pd.set_option('io.hdf.default_format', 'table')
>
> with pd.HDFStore('store3.h5', index=False) as store:
> df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
> store.append('df', df, index=False)
> print(repr(store))
> """
>
> $ python prova.py
>
> <class 'pandas.io.pytables.HDFStore'>
>
> File path: store3.h5
>
> /df frame_table
> (typ->appendable,nrows->4,ncols->2,indexers->[index])
>
> $ h5ls -rd store3.h5
>
> / Group
>
> /df Group
>
> /df/table Dataset {2/Inf}
>
> Data:
>
> (0) {0, [1,2]}, {1, [3,4]}
>
> Hope this helps,
>
> Francesc
>
> 2015-12-18 18:12 GMT+01:00 Jaworski, Sarah S <[email protected]>:
>
>> I am writing a python script to write a table to hdf5 file. Based off
>> some quick googling, using the pandas library seemed like an easy way to
>> accomplish this. The code is as follows. The method is called in a loop,
>> sending data to it in sections since all the data cannot be stored in
>> memory at the same time (hence, the ‘first_time’ flag):
>>
>>
>>
>> def *write_to_hdf*(data, filename, first_time):
>>
>> from pandas import DataFrame
>>
>> data_frame = DataFrame.from_dict(data)
>>
>>
>>
>> # save to hdf5
>>
>> if first_time == True:
>>
>> data_frame.to_hdf(filename, *'data'*, mode=*'w'*, format=
>> *'table'*, append=True)
>>
>> else:
>>
>> data_frame.to_hdf(filename, *'data'*, append=True)
>>
>>
>>
>> # allow data frame to be garbage collected
>>
>> del data_frame
>>
>>
>>
>> This seems to work fine. However, upon inspecting the HDF5 file, I saw
>> some things that I didn’t expect. Having never worked with HDF5 tables
>> before, I expected to see a dataset named ‘data’ with a compound type that
>> contained a member for each each field in my data frame. My example table
>> has 13,403 rows and three columns: TIME, $EP, and $SYSID. The HDF5 file
>> looks like this when using h5disp from Matlab:
>>
>>
>>
>> >> h5disp('C:\Data\hdf-export.h5')
>>
>> HDF5 hdf-export.h5
>>
>> Group '/'
>>
>> Attributes:
>>
>> 'TITLE': ''
>>
>> 'CLASS': 'GROUP'
>>
>> 'VERSION': '1.0'
>>
>> 'PYTABLES_FORMAT_VERSION': '2.1'
>>
>> Group '/data'
>>
>> Attributes:
>>
>> 'TITLE': ''
>>
>> 'CLASS': 'GROUP'
>>
>> 'VERSION': '1.0'
>>
>> 'pandas_type': 'frame_table'
>>
>> 'pandas_version': '0.10.1'
>>
>> 'table_type': 'appendable_frame'
>>
>> 'index_cols': '(lp1
>>
>> (I0
>>
>> S'index'
>>
>> p2
>>
>> tp3
>>
>> a.'
>>
>> 'values_cols': '(lp1
>>
>> S'values_block_0'
>>
>> p2
>>
>> aS'values_block_1'
>>
>> p3
>>
>> a.'
>>
>> 'non_index_axes': '(lp1
>>
>> (I1
>>
>> (lp2
>>
>> S'$EP'
>>
>> p3
>>
>> aS'$SYSID'
>>
>> p4
>>
>> aS'TIME'
>>
>> p5
>>
>> atp6
>>
>> a.'
>>
>> 'data_columns': '(lp1
>>
>> .'
>>
>> 'nan_rep': 'nan'
>>
>> 'encoding': 'N.'
>>
>> 'levels': 1
>>
>> 'info': '(dp1
>>
>> I1
>>
>> (dp2
>>
>> S'type'
>>
>> p3
>>
>> S'Index'
>>
>> p4
>>
>> sS'names'
>>
>> p5
>>
>> (lp6
>>
>> NassS'index'
>>
>> p7
>>
>> (dp8
>>
>> s.'
>>
>> Dataset 'table'
>>
>> Size: 13403
>>
>> MaxSize: Inf
>>
>> Datatype: H5T_COMPOUND
>>
>> Member 'index': H5T_STD_I64LE (int64)
>>
>> Member 'values_block_0': H5T_ARRAY
>>
>> Size: 1
>>
>> Base Type: H5T_IEEE_F64LE (double)
>>
>> Member 'values_block_1': H5T_ARRAY
>>
>> Size: 2
>>
>> Base Type: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 2048
>>
>> Filters: none
>>
>> Attributes:
>>
>> 'CLASS': 'TABLE'
>>
>> 'VERSION': '2.7'
>>
>> 'TITLE': ''
>>
>> 'FIELD_0_NAME': 'index'
>>
>> 'FIELD_1_NAME': 'values_block_0'
>>
>> 'FIELD_2_NAME': 'values_block_1'
>>
>> 'FIELD_0_FILL': 0
>>
>> 'FIELD_1_FILL': 0.000000
>>
>> 'FIELD_2_FILL': 0
>>
>> 'index_kind': 'integer'
>>
>> 'values_block_0_kind': '(lp1
>>
>> S'TIME'
>>
>> p2
>>
>> a.'
>>
>> 'values_block_0_dtype': 'float64'
>>
>> 'values_block_1_kind': '(lp1
>>
>> S'$EP'
>>
>> p2
>>
>> aS'$SYSID'
>>
>> p3
>>
>> a.'
>>
>> 'values_block_1_dtype': 'int64'
>>
>> 'NROWS': 13403
>>
>> Group '/data/_i_table'
>>
>> Attributes:
>>
>> 'TITLE': 'Indexes container for table /data/table'
>>
>> 'CLASS': 'TINDEX'
>>
>> 'VERSION': '1.0'
>>
>> Group '/data/_i_table/index'
>>
>> Attributes:
>>
>> 'TITLE': 'Index for index column'
>>
>> 'CLASS': 'INDEX'
>>
>> 'VERSION': '2.1'
>>
>> 'FILTERS': 65793
>>
>> 'superblocksize': 262144
>>
>> 'blocksize': 131072
>>
>> 'slicesize': 131072
>>
>> 'chunksize': 1024
>>
>> 'optlevel': 6
>>
>> 'reduction': 1
>>
>> 'DIRTY': 0
>>
>> Dataset 'abounds'
>>
>> Size: 0
>>
>> MaxSize: Inf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 8192
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'EARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Start bounds'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'bounds'
>>
>> Size: 127x0
>>
>> MaxSize: 127xInf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 127x1
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'CACHEARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Boundary Values'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'indices'
>>
>> Size: 131072x0
>>
>> MaxSize: 131072xInf
>>
>> Datatype: H5T_STD_U32LE (uint32)
>>
>> ChunkSize: 1024x1
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'INDEXARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Number of chunk in table'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'indicesLR'
>>
>> Size: 131072
>>
>> MaxSize: 131072
>>
>> Datatype: H5T_STD_U32LE (uint32)
>>
>> ChunkSize: 1024
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'LASTROWARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Last Row indices'
>>
>> 'nelements': 13403
>>
>> Dataset 'mbounds'
>>
>> Size: 0
>>
>> MaxSize: Inf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 8192
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'EARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Median bounds'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'mranges'
>>
>> Size: 0
>>
>> MaxSize: Inf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 8192
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'EARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Median ranges'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'ranges'
>>
>> Size: 2x0
>>
>> MaxSize: 2xInf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 2x4096
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'CACHEARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Range Values'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'sorted'
>>
>> Size: 131072x0
>>
>> MaxSize: 131072xInf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 1024x1
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'INDEXARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Sorted Values'
>>
>> 'EXTDIM': 0
>>
>> Dataset 'sortedLR'
>>
>> Size: 131201
>>
>> MaxSize: 131201
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 1024
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'LASTROWARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'Last Row sorted values + bounds'
>>
>> 'nelements': 13403
>>
>> Dataset 'zbounds'
>>
>> Size: 0
>>
>> MaxSize: Inf
>>
>> Datatype: H5T_STD_I64LE (int64)
>>
>> ChunkSize: 8192
>>
>> Filters: shuffle, deflate(1)
>>
>> Attributes:
>>
>> 'CLASS': 'EARRAY'
>>
>> 'VERSION': '1.1'
>>
>> 'TITLE': 'End bounds'
>>
>> 'EXTDIM': 0
>>
>>
>>
>> I see that /data/table has two arrays that hold my data values. However,
>> they are not named after the fields in my data frame. I need to be able to
>> read the resulting HDF5 file from Matlab. I also need to be able to use
>> the HDF5 Java object API to read this data for a separate application that
>> I maintain. I don’t see a way to even figure out what the fieldnames in my
>> original dataset are. I see them embedded in some attributes within a
>> larger string, but nothing straightforward. In the HDF C API, I see H5TB
>> methods like H5TBread_fields_name, which seem like they would do this. I
>> don’t see an equivalent API in Java. I also don’t see anything in Matlab’s
>> documentation. (I’m using Matlab R2012b.)
>>
>>
>>
>> Any help in trying to read this table from the HDF5 correctly in Matlab
>> and/or from the Java object API is appreciated.
>>
>>
>>
>> Thank you.
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
>
>
> --
> Francesc Alted
>
--
Francesc Alted
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5