2015-12-22 21:47 GMT+01:00 Jaworski, Sarah S <[email protected]>:
> Hi, > > I have a python script that essentially converts a csv file to python, but > I have a few problems that I haven’t been able to solve. > > > > 1. Column Order – I don’t know the columns that I need to write > until runtime, so creating an extension of the IsDescription class was a > non-starter. Therefore, to define by columns, I am passing in a dictionary > that maps column name to column class to the create_table method: > > > > h5_file = tables.open_file(filename, mode = *'w'*, title = *'Test > File'*) > > group = h5_file.create_group(*'/'*, *'data'*, *'Data Group'*) > > column_dict = OrderedDict() > > for key in column_names: > > column_dict[key] = create_col(key) > > > > table = h5_file.create_table(group, *'table'*, column_dict, *'Table'*) > > > > create_col is simply a method that returns Int32Col(), Float64Col(), etc., > depending on some information about the column. That is working fine. > However, the columns in the table that are created are not in the order > that I want. I used OrderedDict to ensure that the columns are in the > dictionary in insertion order, but the table doesn’t reflect this. Any > ideas on how to control the column order if I can’t extend IsDescription to > create my data type? > Yes. The `col` parameter of IsDescription is your friend here. See an example here: http://www.pytables.org/usersguide/libref/structured_storage.html#table-methods-writing > > > 2. Variable length strings – Strings work fine when I give them a > maximum size. This was fine to get something up and running, but the > strings really need to be variable length. Is there a way to have VLString > columns within a table? I see examples of VLStringAtom being passed as a > type to h5file.create_array, but I don’t see similar examples for table > columns and there isn’t a Col class for this type. Any help is appreciated. > No, PyTables does not have provision for handling variable length strings in Table instances (datasets with compound objects in HDF5 parlance). The reason for this is mainly the additional performance overhead that handling with variable length would require. For the cases where you absolutely need that the general advice is to have a Table and a separate VLArray instance(s) with the same order in the row entries. Then it is just a matter of retrieving items in VLArray instances as needed. > > > 3. “Blanks” in my csv file – The csv files I’m converting contain > null or blank values. If you imagine loading the file in Excel or a > similar program, some cells will be blank. So, even if column X is an > Int32Col, there may be blanks. How would I handle this using PyTables? I > suppose I can substitute some value for blank cells, but I would like to > avoid that if possible. > There are different approaches for this. For example, you can use a NaN (Not a Number) IEEE representation, but this needs you to use floats indeed. Another approach would be to use a special value in Int32 that is not going to match any of your input values (something like -2**31) to represent a 'NaN'. Handling this special values would be the responsibility of your code, as PyTables (nor HDF5, I think) does not mess with that. -- Francesc Alted
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
