Hi,
I have a python script that essentially converts a csv file to python, but I
have a few problems that I haven't been able to solve.
1. Column Order - I don't know the columns that I need to write until
runtime, so creating an extension of the IsDescription class was a non-starter.
Therefore, to define by columns, I am passing in a dictionary that maps column
name to column class to the create_table method:
h5_file = tables.open_file(filename, mode = 'w', title = 'Test File')
group = h5_file.create_group('/', 'data', 'Data Group')
column_dict = OrderedDict()
for key in column_names:
column_dict[key] = create_col(key)
table = h5_file.create_table(group, 'table', column_dict, 'Table')
create_col is simply a method that returns Int32Col(), Float64Col(), etc.,
depending on some information about the column. That is working fine.
However, the columns in the table that are created are not in the order that I
want. I used OrderedDict to ensure that the columns are in the dictionary in
insertion order, but the table doesn't reflect this. Any ideas on how to
control the column order if I can't extend IsDescription to create my data type?
2. Variable length strings - Strings work fine when I give them a maximum
size. This was fine to get something up and running, but the strings really
need to be variable length. Is there a way to have VLString columns within a
table? I see examples of VLStringAtom being passed as a type to
h5file.create_array, but I don't see similar examples for table columns and
there isn't a Col class for this type. Any help is appreciated.
3. "Blanks" in my csv file - The csv files I'm converting contain null or
blank values. If you imagine loading the file in Excel or a similar program,
some cells will be blank. So, even if column X is an Int32Col, there may be
blanks. How would I handle this using PyTables? I suppose I can substitute
some value for blank cells, but I would like to avoid that if possible.
Help on any of these items is greatly appreciated. I know that using h5py
(would I have to use the low-level API?) instead of pytables would probably
solve these problems, but am trying to avoid that since pytables has otherwise
been so easy to use.
Thanks in advance,
Sarah
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5