[Pytables-users] Compression and Indexing in PyTables

2013-07-28 Thread David Reed
Hi there,  I was wondering if there any nice tutorials that show the
different compression options such as zlib, bzo, etc. and how to actually
use them with my tables.

There seems to be a lot of good information describing the performance
increase under the Optimization Tips section, but I don't see any clear way
of actually doing this.

Maybe I'm missing something.

Thanks for the help.

-Dave
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Compression and Indexing in PyTables

2013-07-28 Thread Andreas Hilboll
Hi David,

Am 28.07.2013 15:24, schrieb David Reed:
> Hi there,  I was wondering if there any nice tutorials that show the
> different compression options such as zlib, bzo, etc. and how to
> actually use them with my tables.  
> 
> There seems to be a lot of good information describing the performance
> increase under the Optimization Tips section, but I don't see any clear
> way of actually doing this.  
> 
> Maybe I'm missing something.  

Maybe you're missing this:

   http://pandas.pydata.org/pandas-docs/stable/io.html#compression

The HDFStore constructor has a "complib" kwarg which you can use to set
the compression library. Also look at "complevel" to set the compression
efficiency.

-- Andreas.

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Compression and Indexing in PyTables

2013-07-28 Thread David Reed
maybe I wasn't aware of this, but has PANDAS completely wrapped PyTables,
or is PyTables something I should still be using for storing and accessing
scientific data, and PANDAS has an access point to it?


On Sun, Jul 28, 2013 at 9:48 AM, Andreas Hilboll  wrote:

> Hi David,
>
> Am 28.07.2013 15:24, schrieb David Reed:
> > Hi there,  I was wondering if there any nice tutorials that show the
> > different compression options such as zlib, bzo, etc. and how to
> > actually use them with my tables.
> >
> > There seems to be a lot of good information describing the performance
> > increase under the Optimization Tips section, but I don't see any clear
> > way of actually doing this.
> >
> > Maybe I'm missing something.
>
> Maybe you're missing this:
>
>http://pandas.pydata.org/pandas-docs/stable/io.html#compression
>
> The HDFStore constructor has a "complib" kwarg which you can use to set
> the compression library. Also look at "complevel" to set the compression
> efficiency.
>
> -- Andreas.
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Compression and Indexing in PyTables

2013-07-28 Thread Francesc Alted
On 7/28/13 9:24 AM, David Reed wrote:
> Hi there,  I was wondering if there any nice tutorials that show the 
> different compression options such as zlib, bzo, etc. and how to 
> actually use them with my tables.
>
> There seems to be a lot of good information describing the performance 
> increase under the Optimization Tips section, but I don't see any 
> clear way of actually doing this.
>
> Maybe I'm missing something.

Well, the compression options are part of the more general Filters 
helper class:

http://pytables.github.io/usersguide/libref/helper_classes.html#the-filters-class

This stems from the fact that in HDF5 a compressor is just like another 
data filter.

-- 
Francesc Alted


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Compression and Indexing in PyTables

2013-07-28 Thread Francesc Alted
On 7/28/13 10:21 AM, David Reed wrote:
> maybe I wasn't aware of this, but has PANDAS completely wrapped 
> PyTables, or is PyTables something I should still be using for storing 
> and accessing scientific data, and PANDAS has an access point to it?

Yeah, more the later than the former.  PyTables is an standalone 
library, but Pandas uses it as another storage backend.

-- 
Francesc Alted


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Compression and Indexing in PyTables

2013-07-28 Thread Francesc Alted
More input from Jeff Reback.

Hey Jeff, I see that you try to post here from time to time, but your 
messages bounce because the address that you use as sender is not 
subscribed.  Please make sure that you post from a subscribed address.  
Thanks!

Francesc

On 7/28/13 11:35 AM, pytables-users-boun...@lists.sourceforge.net wrote:
> The attached message has been automatically discarded.


Re: [Pytables-users] Compression and Indexing in PyTables.eml

Subject:
Re: [Pytables-users] Compression and Indexing in PyTables
From:
Jeff Reback 
Date:
7/28/13 11:35 AM

To:
Discussion list for PyTables 


pandas stores using Pytables
and embeds extra meta data in the attributes to enable deserialization to the 
original pandas structure



On Jul 28, 2013, at 11:23 AM, Francesc Alted  wrote:

> On 7/28/13 10:21 AM, David Reed wrote:
>> maybe I wasn't aware of this, but has PANDAS completely wrapped
>> PyTables, or is PyTables something I should still be using for storing
>> and accessing scientific data, and PANDAS has an access point to it?
> Yeah, more the later than the former.  PyTables is an standalone
> library, but Pandas uses it as another storage backend.
>
> -- 
> Francesc Alted
>
>
> --
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users

-- 
Francesc Alted


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Tables vs Arrays

2013-07-28 Thread David Reed
I'm really trying to become more productive using PyTables, but am
struggling with what I should be using.  Whats the difference between a
table and an array?
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Tables vs Arrays

2013-07-28 Thread Anthony Scopatz
On Sun, Jul 28, 2013 at 8:38 PM, David Reed  wrote:

> I'm really trying to become more productive using PyTables, but am
> struggling with what I should be using.  Whats the difference between a
> table and an array?
>

Hi David,

The difference between Arrays and Tables, conceptually is the same as the
different between numpy arrays and numpy structured arrays.  The plain old
[Aa]rray is a continuous block of a single data type.  Tables and
structured arrays have a more complex data type that is composed of a
continuous sequence of other data types (ie the fields / columns).  Which
data structure you use really depends a lot of the type of problem you are
trying to solve and what kinds of questions you want to answer with that
data structure.

That said, the implementation of Tables is far more similar to EArrays than
Arrays.  So a lot of the performance trade offs that you see are similar.

You should watch my "HDF5 is for Lovers" talk for more generic advice [1].
 I hope this helps!

Be Well
Anthony

1. http://www.youtube.com/watch?v=Nzx0HAd3FiI


>
>
>
>
> --
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Tables vs Arrays

2013-07-28 Thread Francesc Alted
On 7/28/13 9:58 PM, Anthony Scopatz wrote:
>
> On Sun, Jul 28, 2013 at 8:38 PM, David Reed  > wrote:
>
> I'm really trying to become more productive using PyTables, but am
> struggling with what I should be using.  Whats the difference
> between a table and an array?
>
>
> Hi David,
>
> The difference between Arrays and Tables, conceptually is the same as 
> the different between numpy arrays and numpy structured arrays.  The 
> plain old [Aa]rray is a continuous block of a single data type. 
>  Tables and structured arrays have a more complex data type that is 
> composed of a continuous sequence of other data types (ie the fields / 
> columns).  Which data structure you use really depends a lot of the 
> type of problem you are trying to solve and what kinds of questions 
> you want to answer with that data structure.
>
> That said, the implementation of Tables is far more similar to EArrays 
> than Arrays.  So a lot of the performance trade offs that you see are 
> similar.

Besides this, another interesting difference is that Tables allow 
queries to be performed in a similar way to relational databases (but 
using a more NumPy-esque syntax).  Here it is some examples:

http://pytables.github.io/cookbook/hints_for_sql_users.html?highlight=query#selecting-data

and you can index columns too:

http://pytables.github.io/cookbook/hints_for_sql_users.html?highlight=query#creating-an-index

so that you can accelerate queries involving indexed columns.

-- 
Francesc Alted


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users