Thank you Francesc, Please give me 2-3 days try your example ... do some reading and testes based as per the link mentioned.
I shall repost soon. Thank you Nitin On 30 January 2017 at 17:14, Francesc Altet <[email protected]> wrote: > Hi Nitin, > > > I think before getting into details, you need to look into how to > efficiently read and write data from CSV files into HDF5 in Python. For > this, pandas is a great library to use. My advice is to have a look at the > excellent documentation in pandas website: > > > http://pandas.pydata.org/pandas-docs/stable/io.html > > > In particular, you want to use the `pandas.read_csv()` which one of the > fastest ways to read CSV files that I am aware of. Also, for storing the > data in HDF5, `pandas.HDFStore()` comes handy because it can generate HDF5 > files out of pandas Dataframes. In addition, in order to avoid loading all > the data in a Dataframe in memory, you want to use the `chunksize` keyword > that will allow to read the CSV files in chunks before storing. > > > I have prepared an example for you (attached) so that you can have a look at > how to use all of this (it is simpler than it may seem). Here it is the > output on my machine: > > > $ python csv_demo.py > CSV creation time: 1.491 (67.092 Krow/s) > CSV reading time: 0.134 (748.360 Krow/s) > HDF5 store time: 0.322 (310.228 Krow/s) > HDF5 read time: 0.006 (15622.990 Krow/s) > > > so, once the data is stored in HDF5, the read times will be much faster than > using CSV (as expected). > > > HTH, > > > Francesc > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
