Hi Nitin,

Yes, HDF5 files generated in pandas can be appended with more rows easily using 
the HDFStore.append() method (as shown in the documentation and in my examples).

Regarding visualizations, pandas uses its own format on top of HDF5 to store 
dataframes, so this is why using a standard HDF5 viewer (like HDFView) is not 
showing the table (i.e. compound type) that you might expect.  For this, it is 
better to use pandas itself to read the HDF5 dataset (or parts of it) and then 
visualize the resulting dataframe with one of many existing tools that 
interacts well with pandas:

http://pandas.pydata.org/pandas-docs/stable/ecosystem.html#visualization

Take your time to decide which tool works best for your case.  Meanwhile, you 
can have a glance at the kind of plots that can produce plotly with HDF5 files 
produced by pandas:

https://plot.ly/python/pytables

In general, and if you want to proceed with the pandas path, you may want to 
ask in the pandas mailing list, where far more people will be ready for helping 
you.


Francesc Alted

________________________________
From: Hdf-forum <[email protected]> on behalf of nitin 
chandra <[email protected]>
Sent: Wednesday, February 1, 2017 8:04:58 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] CSV data into HDF5 data structure and files

Hi Francesc,

I tried your example as it is, could not get time to modify and try
some thing new.

ran the

$ python csv_demo.py

it did create a CSV file with 10 columns, populating the columns with random no.

The demo.h5 was created, and I used HDFView 2.9 to see the contents of
the demo.h5 file.

created were a directory table,

 and data table - table.

 In the data table - table, there are 2 columns

index   |   value_block_0

empty   | no value
no data | but 10 commas

So that I can relate to your guidance with respect to the issue,
please find attached 2 sample files.
Also, note the first row in CSVs attached, this was created to
initialise the start point of data sequence. Will it be a good
practice to have them in h5 tables also ? Last column has string
values, need them.

ALIGN data goes into file1 and GRADE data into File2, so I am looking
for a write function to write into respective tables and then read
function to read from them.

After the data is in H5 file, can I insert/add/append a new row in
between other rows or at end of file ? Which editor to use or method
to do it in ?

Thank you,

Nitin

On 30 January 2017 at 23:01, nitin chandra <[email protected]> wrote:
> Thank you Francesc,
>
> Please give me 2-3 days try your example ... do some reading and
> testes based as per the link mentioned.
>
> I shall repost soon.
>
> Thank you
>
> Nitin
>
> On 30 January 2017 at 17:14, Francesc Altet <[email protected]> wrote:
>> Hi Nitin,
>>
>>
>> I think before getting into details, you need to look into how to
>> efficiently read and write data from CSV files into HDF5 in Python.  For
>> this, pandas is a great library to use.  My advice is to have a look at the
>> excellent documentation in pandas website:
>>
>>
>> http://pandas.pydata.org/pandas-docs/stable/io.html
>>
>>
>> In particular, you want to use the `pandas.read_csv()` which one of the
>> fastest ways to read CSV files that I am aware of.  Also, for storing the
>> data in HDF5, `pandas.HDFStore()` comes handy because it can generate HDF5
>> files out of pandas Dataframes.  In addition, in order to avoid loading all
>> the data in a Dataframe in memory, you want to use the `chunksize` keyword
>> that will allow to read the CSV files in chunks before storing.
>>
>>
>> I have prepared an example for you (attached) so that you can have a look at
>> how to use all of this (it is simpler than it may seem).  Here it is the
>> output on my machine:
>>
>>
>> $ python csv_demo.py
>> CSV creation time: 1.491 (67.092 Krow/s)
>> CSV reading time: 0.134 (748.360 Krow/s)
>> HDF5 store time: 0.322 (310.228 Krow/s)
>> HDF5 read time: 0.006 (15622.990 Krow/s)
>>
>>
>> so, once the data is stored in HDF5, the read times will be much faster than
>> using CSV (as expected).
>>
>>
>> HTH,
>>
>>
>> Francesc
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to