Re: [Pytables-users] Pytables bulk loading data

2013-07-18 Thread Pushkar Raj Pande
Thanks. I will try it out and post any findings.

Pushkar

On Thu, Jul 18, 2013 at 12:36 AM, Andreas Hilboll  wrote:

> >
>
> You could use pandas_ and the read_table function. There, you have nrows
> and skiprows parameters with which you can easily do your own 'streaming'.
>
> .. _pandas: http://pandas.pydata.org/



On Thu, Jul 18, 2013 at 1:00 AM, Antonio Valentino <
antonio.valent...@tiscali.it> wrote:

> Hi Pushkar,
>
> Il 18/07/2013 08:45, Pushkar Raj Pande ha scritto:
> > Both loadtxt and genfromtxt read the entire data into memory which is not
> > desirable. Is there a way to achieve streaming writes?
> >
>
> OK, probably fromfile [1] can help you to cook something that works
> without loading the entire file into memory (and without too much
> iterations over the file).
>
> Anyway I strongly recommend you to not perform read/write cycles on
> single lines, rather define a reasonable data block size (number of
> rows) and process the file in chunks.
>
> If you find a reasonably simple solution it would be nice to include it
> in out documentation as an example or a "recipe" [2]
>
> [1]
>
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html#numpy.fromfile
> [2] http://pytables.github.io/latest/cookbook/index.html
>
> best regards
>
> antonio
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8

2013-07-17 Thread Pushkar Raj Pande
Both loadtxt and genfromtxt read the entire data into memory which is not
desirable. Is there a way to achieve streaming writes?

Thanks,
Pushkar


On Wed, Jul 17, 2013 at 7:04 PM, Pushkar Raj Pande wrote:

> Thanks Antonio and Anthony. I will give this a try.
>
> -Pushkar
>
>
> On Wed, Jul 17, 2013 at 2:59 PM, <
> pytables-users-requ...@lists.sourceforge.net> wrote:
>
>> Date: Wed, 17 Jul 2013 16:59:16 -0500
>> From: Anthony Scopatz 
>> Subject: Re: [Pytables-users] Pytables bulk loading data
>> To: Discussion list for PyTables
>> 
>> Message-ID:
>> <
>> capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi Pushkar,
>>
>> I agree with Antonio.  You should load your data with NumPy functions and
>> then write back out to PyTables.  This is the fastest way to do things.
>>
>> Be Well
>> Anthony
>>
>>
>> On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino <
>> antonio.valent...@tiscali.it> wrote:
>>
>> > Hi Pushkar,
>> >
>> > Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
>> > > Hi all,
>> > >
>> > > I am trying to figure out the best way to bulk load data into
>> pytables.
>> > > This question may have been already answered but I couldn't find what
>> I
>> > was
>> > > looking for.
>> > >
>> > > The source data is in form of csv which may require parsing, type
>> > checking
>> > > and setting default values if it doesn't conform to the type of the
>> > column.
>> > > There are over 100 columns in a record. Doing this in a loop in python
>> > for
>> > > each row of the record is very slow compared to just fetching the rows
>> > from
>> > > one pytable file and writing it to another. Difference is almost a
>> factor
>> > > of ~50.
>> > >
>> > > I believe if I load the data using a C procedure that does the parsing
>> > and
>> > > builds the records to write in pytables I can get close to the speed
>> of
>> > > just copying and writing the rows from 1 pytable to another. But may
>> be
>> > > there is something simple and better that already exists. Can someone
>> > > please advise? But if it is a C procedure that I should write can
>> someone
>> > > point me to some examples or snippets that I can refer to put this
>> > together.
>> > >
>> > > Thanks,
>> > > Pushkar
>> > >
>> >
>> > numpy has some tools for loading data from csv files like loadtxt [1],
>> > genfromtxt [2] and other variants.
>> >
>> > Non of them is OK for you?
>> >
>> > [1]
>> >
>> >
>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
>> > [2]
>> >
>> >
>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt
>> >
>> >
>> > cheers
>> >
>> > --
>> > Antonio Valentino
>> >
>> >
>> >
>> --
>> > See everything from the browser to the database with AppDynamics
>> > Get end-to-end visibility with application monitoring from AppDynamics
>> > Isolate bottlenecks and diagnose root cause in seconds.
>> > Start your free trial of AppDynamics Pro today!
>> >
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> > ___
>> > Pytables-users mailing list
>> > Pytables-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>> -- next part --
>> An HTML attachment was scrubbed...
>>
>> --
>>
>>
>> --
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>> --
>>
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>> End of Pytables-users Digest, Vol 86, Issue 8
>> *
>>
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Pytables-users Digest, Vol 86, Issue 8

2013-07-17 Thread Pushkar Raj Pande
Thanks Antonio and Anthony. I will give this a try.

-Pushkar


On Wed, Jul 17, 2013 at 2:59 PM, <
pytables-users-requ...@lists.sourceforge.net> wrote:

> Date: Wed, 17 Jul 2013 16:59:16 -0500
> From: Anthony Scopatz 
> Subject: Re: [Pytables-users] Pytables bulk loading data
> To: Discussion list for PyTables
> 
> Message-ID:
> <
> capk-6t4ht9+ncdd_1oojrbn4u_6+ouekobklmokeufjojjk...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Pushkar,
>
> I agree with Antonio.  You should load your data with NumPy functions and
> then write back out to PyTables.  This is the fastest way to do things.
>
> Be Well
> Anthony
>
>
> On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino <
> antonio.valent...@tiscali.it> wrote:
>
> > Hi Pushkar,
> >
> > Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto:
> > > Hi all,
> > >
> > > I am trying to figure out the best way to bulk load data into pytables.
> > > This question may have been already answered but I couldn't find what I
> > was
> > > looking for.
> > >
> > > The source data is in form of csv which may require parsing, type
> > checking
> > > and setting default values if it doesn't conform to the type of the
> > column.
> > > There are over 100 columns in a record. Doing this in a loop in python
> > for
> > > each row of the record is very slow compared to just fetching the rows
> > from
> > > one pytable file and writing it to another. Difference is almost a
> factor
> > > of ~50.
> > >
> > > I believe if I load the data using a C procedure that does the parsing
> > and
> > > builds the records to write in pytables I can get close to the speed of
> > > just copying and writing the rows from 1 pytable to another. But may be
> > > there is something simple and better that already exists. Can someone
> > > please advise? But if it is a C procedure that I should write can
> someone
> > > point me to some examples or snippets that I can refer to put this
> > together.
> > >
> > > Thanks,
> > > Pushkar
> > >
> >
> > numpy has some tools for loading data from csv files like loadtxt [1],
> > genfromtxt [2] and other variants.
> >
> > Non of them is OK for you?
> >
> > [1]
> >
> >
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt
> > [2]
> >
> >
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt
> >
> >
> > cheers
> >
> > --
> > Antonio Valentino
> >
> >
> >
> --
> > See everything from the browser to the database with AppDynamics
> > Get end-to-end visibility with application monitoring from AppDynamics
> > Isolate bottlenecks and diagnose root cause in seconds.
> > Start your free trial of AppDynamics Pro today!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> > ___
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
>
> --
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>
> --
>
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> End of Pytables-users Digest, Vol 86, Issue 8
> *
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Pytables bulk loading data

2013-07-17 Thread Pushkar Raj Pande
Hi all,

I am trying to figure out the best way to bulk load data into pytables.
This question may have been already answered but I couldn't find what I was
looking for.

The source data is in form of csv which may require parsing, type checking
and setting default values if it doesn't conform to the type of the column.
There are over 100 columns in a record. Doing this in a loop in python for
each row of the record is very slow compared to just fetching the rows from
one pytable file and writing it to another. Difference is almost a factor
of ~50.

I believe if I load the data using a C procedure that does the parsing and
builds the records to write in pytables I can get close to the speed of
just copying and writing the rows from 1 pytable to another. But may be
there is something simple and better that already exists. Can someone
please advise? But if it is a C procedure that I should write can someone
point me to some examples or snippets that I can refer to put this together.

Thanks,
Pushkar
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users