On 18.07.2013 08:45, Pushkar Raj Pande wrote:
> Both loadtxt and genfromtxt read the entire data into memory which is
> not desirable. Is there a way to achieve streaming writes?
> 
> Thanks,
> Pushkar
> 
> 
> On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande <topgun...@gmail.com
> <mailto:topgun...@gmail.com>> wrote:
> 
>     Thanks Antonio and Anthony. I will give this a try.
> 
>     -Pushkar
> 
> 
>     On Wed, Jul 17, 2013 at 2:59 PM,
>     <pytables-users-requ...@lists.sourceforge.net
>     <mailto:pytables-users-requ...@lists.sourceforge.net>> wrote:
> 
>         Date: Wed, 17 Jul 2013 16:59:16 -0500
>         From: Anthony Scopatz <scop...@gmail.com <mailto:scop...@gmail.com>>
>         Subject: Re: [Pytables-users] Pytables bulk loading data
>         To: Discussion list for PyTables
>                 <pytables-users@lists.sourceforge.net
>         <mailto:pytables-users@lists.sourceforge.net>>
>         Message-ID:
>                
>         <capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com
>         
> <mailto:capk-6t4ht9%2bncdd_1oojrbn4u_6%2bouekobklmokeufjojjk...@mail.gmail.com>>
>         Content-Type: text/plain; charset="iso-8859-1"
> 
>         Hi Pushkar,
> 
>         I agree with Antonio.  You should load your data with NumPy
>         functions and
>         then write back out to PyTables.  This is the fastest way to do
>         things.
> 
>         Be Well
>         Anthony
> 
> 
>         On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino <
>         antonio.valent...@tiscali.it
>         <mailto:antonio.valent...@tiscali.it>> wrote:
> 
>         > Hi Pushkar,
>         >
>         > Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
>         > > Hi all,
>         > >
>         > > I am trying to figure out the best way to bulk load data
>         into pytables.
>         > > This question may have been already answered but I couldn't
>         find what I
>         > was
>         > > looking for.
>         > >
>         > > The source data is in form of csv which may require parsing,
>         type
>         > checking
>         > > and setting default values if it doesn't conform to the type
>         of the
>         > column.
>         > > There are over 100 columns in a record. Doing this in a loop
>         in python
>         > for
>         > > each row of the record is very slow compared to just
>         fetching the rows
>         > from
>         > > one pytable file and writing it to another. Difference is
>         almost a factor
>         > > of ~50.
>         > >
>         > > I believe if I load the data using a C procedure that does
>         the parsing
>         > and
>         > > builds the records to write in pytables I can get close to
>         the speed of
>         > > just copying and writing the rows from 1 pytable to another.
>         But may be
>         > > there is something simple and better that already exists.
>         Can someone
>         > > please advise? But if it is a C procedure that I should
>         write can someone
>         > > point me to some examples or snippets that I can refer to
>         put this
>         > together.
>         > >
>         > > Thanks,
>         > > Pushkar
>         > >
>         >
>         > numpy has some tools for loading data from csv files like
>         loadtxt [1],
>         > genfromtxt [2] and other variants.
>         >
>         > Non of them is OK for you?
>         >
>         > [1]
>         >
>         >
>         
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
>         > [2]
>         >
>         >
>         
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt
>         >
>         >
>         > cheers
>         >
>         > --
>         > Antonio Valentino
>         >
>         >
>         >
>         
> ------------------------------------------------------------------------------
>         > See everything from the browser to the database with AppDynamics
>         > Get end-to-end visibility with application monitoring from
>         AppDynamics
>         > Isolate bottlenecks and diagnose root cause in seconds.
>         > Start your free trial of AppDynamics Pro today!
>         >
>         
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>         > _______________________________________________
>         > Pytables-users mailing list
>         > Pytables-users@lists.sourceforge.net
>         <mailto:Pytables-users@lists.sourceforge.net>
>         > https://lists.sourceforge.net/lists/listinfo/pytables-users
>         >
>         -------------- next part --------------
>         An HTML attachment was scrubbed...
> 
>         ------------------------------
> 
>         
> ------------------------------------------------------------------------------
>         See everything from the browser to the database with AppDynamics
>         Get end-to-end visibility with application monitoring from
>         AppDynamics
>         Isolate bottlenecks and diagnose root cause in seconds.
>         Start your free trial of AppDynamics Pro today!
>         
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> 
>         ------------------------------
> 
>         _______________________________________________
>         Pytables-users mailing list
>         Pytables-users@lists.sourceforge.net
>         <mailto:Pytables-users@lists.sourceforge.net>
>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> 
> 
>         End of Pytables-users Digest, Vol 86, Issue 8
>         *********************************************
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> 
> 
> 
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> 

You could use pandas_ and the read_table function. There, you have nrows
and skiprows parameters with which you can easily do your own 'streaming'.

.. _pandas: http://pandas.pydata.org/

-- Andreas

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to