Thank you Francesc,

Please give me 2-3 days try your example ... do some reading and
testes based as per the link mentioned.

I shall repost soon.

Thank you

Nitin

On 30 January 2017 at 17:14, Francesc Altet <[email protected]> wrote:
> Hi Nitin,
>
>
> I think before getting into details, you need to look into how to
> efficiently read and write data from CSV files into HDF5 in Python.  For
> this, pandas is a great library to use.  My advice is to have a look at the
> excellent documentation in pandas website:
>
>
> http://pandas.pydata.org/pandas-docs/stable/io.html
>
>
> In particular, you want to use the `pandas.read_csv()` which one of the
> fastest ways to read CSV files that I am aware of.  Also, for storing the
> data in HDF5, `pandas.HDFStore()` comes handy because it can generate HDF5
> files out of pandas Dataframes.  In addition, in order to avoid loading all
> the data in a Dataframe in memory, you want to use the `chunksize` keyword
> that will allow to read the CSV files in chunks before storing.
>
>
> I have prepared an example for you (attached) so that you can have a look at
> how to use all of this (it is simpler than it may seem).  Here it is the
> output on my machine:
>
>
> $ python csv_demo.py
> CSV creation time: 1.491 (67.092 Krow/s)
> CSV reading time: 0.134 (748.360 Krow/s)
> HDF5 store time: 0.322 (310.228 Krow/s)
> HDF5 read time: 0.006 (15622.990 Krow/s)
>
>
> so, once the data is stored in HDF5, the read times will be much faster than
> using CSV (as expected).
>
>
> HTH,
>
>
> Francesc
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to