[Pytables-users] Numpy Arrays to Structure Array or Table

2013-08-07 Thread David Reed
Hi there,

I have some generic functions that take time series data with 2 numpy array
arguments, time and value, and return 2 numpy arrays of time and value.

I would like to place these arrays into a Numpy structured array or
directly into a new pytables table with fields, time and value.

Now Ive found I could do this:

t, v = some_func(t, v)

A = np.empty(len(t), dtype=[('time', np.float64), ('value',
np.float64)])

A['time'] = t
A['value'] = v

hfile.createTable(grp, 'signal', description=A)
hfile.flush()

But this seems rather clunky and inefficient.  Any suggestions to make this
repackaging a little smoother?
--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] suitable for storing data like k-v style?

2013-08-07 Thread Anthony Scopatz
Hi Jason,

A key-value store pattern is definitely supported.  However, be forewarned
that groups are implemented using B-trees, not hash tables. However, with
data of your size most of the access time will be in the leaf nodes and not
getting the group.  I'd say try it out and see.

Be Well
Anthony

On Wed, Aug 7, 2013 at 11:33 AM, Xianli Xu  wrote:

> Hi all,
>
> I'm developing data processing service and evaluating if Pytable. Since
> hdf5 supports hierarchical data like a tree of folder, can I use such a
> tree-like structure as a K-V store like possibly store million of tables or
> arrays under one group and randomly access any one of them in O(1) time?
> e.g.
>
> root/
> user_log/
> uid1-> table / array, (of tens of thousand rows /
> elements, ETL'ed user log info in int format)
> uid2-> table / array,
> uid3-> table / array,
> uid4-> table / array,
> uid5-> table / array,
> …… (perhaps million user)
>
> Just wondering how the hierarchical structure is implemented and such
> usage pattern is supported? if no, is there any running or better way to
> store such type of information? We adopt Pytables because the data is
> stored in higher density, faster loaded and no ACID / concurrency overhead,
> so traditional DB and no-sql db is not our option..
>
> Thanks,
> Jason
>
> --
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] should I use pytables?

2013-08-07 Thread Chao YUE
Thanks Anthony, I think I will give a try, apprently at some stage I would
like to flush the data into disk :p

cheers,

Chao

On Wed, Aug 7, 2013 at 6:44 PM, Anthony Scopatz  wrote:

> On Wed, Aug 7, 2013 at 5:44 AM, Chao YUE  wrote:
>
>> Dear all,
>>
>> I have a hierachical nested python dictionaries with the end of the
>> branch as either pandas dataframe, or np.ndarray or list or plain scalars.
>>
>> let's say the different levels of keys are:
>>
>> 1st level: ['top1', 'top2', 'top3']
>> 2nd level: ['mid1','mid2','mid3']
>> 3rd level: ['bot1','bot2','bot3','bot4']
>>
>> I think I am looking for some data strucuture that allow easy retrieving
>> of the data at different levels as dictionaries (I cannot think out
>> something better yet).
>>
>> for example: data.ix['top1',:,'bot1'] will have keys only at the middle
>> levels.
>>
>> I have a quick look of pytables document but not very sure, should I use
>> pytables for this purpose?
>>
>
> Hello Chao,
>
> If you are only ever going to use this data structure in memory, you
> shouldn't use pytables.  If you are going to persist this information to
> disk than pytables is a great choice!  Every dictionary will become a group
> and every leaf data structure will become an Array or a Table.
>
> Be Well
> Anthony
>
>
>>
>> thanks a lot for any idea.
>>
>> cheers,
>>
>> Chao
>>
>> --
>>
>> ***
>> Chao YUE
>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
>> UMR 1572 CEA-CNRS-UVSQ
>> Batiment 712 - Pe 119
>> 91191 GIF Sur YVETTE Cedex
>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>>
>> 
>>
>>
>> --
>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>> It's a free troubleshooting tool designed for production.
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] suitable for storing data like k-v style?

2013-08-07 Thread Xianli Xu
oops sorry, seem auto-correction of my email client created some typo for me : 
P 
here's the corrections,

On 8 Aug, 2013, at 2:33 AM, Xianli Xu  wrote:

> Hi all, 
> 
> I'm developing data processing service and evaluating if Pytable. Since hdf5 
> supports hierarchical data like a tree of folder, can I use such a tree-like 
> structure as a K-V store like possibly store million of tables or arrays 
> under one group and randomly access any one of them in O(1) time? e.g. 
> 
> root/
>   user_log/
>   uid1-> table / array, (of tens of thousand rows / elements, 
> ETL'ed user log info in int format)
>   uid2-> table / array,
>   uid3-> table / array,
>   uid4-> table / array,
>   uid5-> table / array,
>   …… (perhaps million user)
> 
> Just wondering how the hierarchical structure is implemented and such usage 
> pattern is supported? if no, is there any running or better way to store such 
> type of information? We adopt Pytables because the data is stored in

running -> tuning

> higher density, faster loaded and no ACID / concurrency overhead, so 
> traditional DB and no-sql db is not our option..
> 
> Thanks,
> Jason


--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] suitable for storing data like k-v style?

2013-08-07 Thread Xianli Xu
Hi all, 

I'm developing data processing service and evaluating if Pytable. Since hdf5 
supports hierarchical data like a tree of folder, can I use such a tree-like 
structure as a K-V store like possibly store million of tables or arrays under 
one group and randomly access any one of them in O(1) time? e.g. 

root/
user_log/
uid1-> table / array, (of tens of thousand rows / elements, 
ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)

Just wondering how the hierarchical structure is implemented and such usage 
pattern is supported? if no, is there any running or better way to store such 
type of information? We adopt Pytables because the data is stored in higher 
density, faster loaded and no ACID / concurrency overhead, so traditional DB 
and no-sql db is not our option..

Thanks,
Jason
--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] searching for group names

2013-08-07 Thread Anthony Scopatz
On Wed, Aug 7, 2013 at 4:39 AM, Gabriel J.L. Beckers <
pytables-u...@gbeckers.nl> wrote:

> Hi,
>
> I don't know if this is related in any way to Gergo's problem, but I
> have slow responses when querying which children a group contains, if
> that group contains big leafs. I am using pytables 2.5 and hdf5 1.8.9
> on linux 64 bit.
>
> Specifically, I found that using the _g_get_objinfo method (which is
> used by other methods that I use) is slow when used on a large leaf.
> The slowness is proportional to the size of the leaf. It is almost as
> if some process is actually reading the data instead of just info on
> the type of data. I am noticing this because my data is on an external
> usb3 disk. To give you an idea: that method takes almost 80 seconds to
> return the string 'Leaf' when used on a 5 Gb EArray. That should
> roughly correspond to reading the complete disk-based array. The info
> is cached somehow, because if I run the method a second time in the
> same python session it is very fast.
>
> If I copy my hdf5 file to my SSD disk, things are much faster, but
> running the method still takes 2 seconds or so on a 5 Gb leaf.
>
> Is this expected behavior and should I just avoid this method in my
> applications, or is something wrong?
>

Hi Gabriel,

Are you using compression on this EArray?  This method is basically a thin
wrapper over some HDF5 functions. I think that the data that you are asking
for (inadvertently, maybe) is just expensive to get.

Be Well
Anthony


>
> Best, Gabriel
>
> Anthony Scopatz  schreef:
>
> > On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő 
> wrote:
> >
> >> Hello,
> >>
> >>
> >> We develop a measurement evaluation tool, and we'd like to use
> >> pytables/hdf5 as a middle layer for signal accessing.
> >>
> >> We have to deal with the silly structure of the recorder device
> >> measurement format.
> >>
> >>
> >>
> >> The signals can be accessed via two identifiers:
> >>
> >> * device name: - >> message>--
> >>
> >> * signal name
> >>
> >>
> >>
> >> The first identifier says the source information of the signal, which
> >> can be quite long.
> >>
> >> Therefore I grouped the device name into two layers:
> >>
> >> /
> >>
> >> /...
> >>
> >> /
> >>
> >>
> >>
> >> So if you have the same message from two channels, than you will get
> >> /foo-device-name
> >>
> >> /channel-1
> >>
> >> /bar
> >>
> >> /baz
> >>
> >> /channel-2
> >>
> >> /bar
> >>
> >> /baz
> >>
> >>
> >>
> >> Besides signal loading, we have to search for signal name as fast as
> >> possible, and return with the shortest unique device name part and the
> >> signal name.
> >>
> >> Using the structure above, iterating over the group names is quite
> >> slow. So I build up a table from device and signal name.
> >>
> >> As far as I know, the pytables query does not support string searching
> >> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
> >> to a pure python loop which is slow again.
> >>
> >> Therefore I build up a python dictionary from the table, which provide
> >> fast iteration against the table, but the init time increased from 100
> >> ms to 3-4 sec (we have more than 40 000 signals).
> >>
> >>
> >>
> >> Do you have any advice how to search for group names in hdf5 with
> >> pytables in an efficient way?
> >>
> >
> > Hi grego,
> >
> > Searching through group names, like accessing all HDF5 metadata, is slow.
> >  For group names this is because rather than searching through a list you
> > are traversing a B-tree, IIRC.  So you have to use the couple of tricks
> > that you used: 1) have another Table / Array of all table names, 2) read
> > this in once to a native Python data structure (dict here).
> >
> > However, 4 sec to read in this table seems excessive for data of this
> size.
> >  You are probably not reading this in properly.  You should be using:
> >
> > raw_grps = f.root.grp_names[:]
> >
> > or similar.
> >
> > Maybe other people have some other ideas.
> >
> > Be Well
> > Anthony
> >
> >
> >>
> >> ps: I would be most happy with a glob interface.
> >>
> >>
> >>
> >> thanks for your advices in advance,
> >>
> >> gergo
> >>
> >>
> >>
> --
> >> Get your SQL database under version control now!
> >> Version control is standard for application code, but databases havent
> >> caught up. So what steps can you take to put your SQL databases under
> >> version control? Why should you start doing it? Read more to find out.
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> >> ___
> >> Pytables-users mailing list
> >> Pytables-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>
>
>
>
>
>
> ---

Re: [Pytables-users] should I use pytables?

2013-08-07 Thread Anthony Scopatz
On Wed, Aug 7, 2013 at 5:44 AM, Chao YUE  wrote:

> Dear all,
>
> I have a hierachical nested python dictionaries with the end of the branch
> as either pandas dataframe, or np.ndarray or list or plain scalars.
>
> let's say the different levels of keys are:
>
> 1st level: ['top1', 'top2', 'top3']
> 2nd level: ['mid1','mid2','mid3']
> 3rd level: ['bot1','bot2','bot3','bot4']
>
> I think I am looking for some data strucuture that allow easy retrieving
> of the data at different levels as dictionaries (I cannot think out
> something better yet).
>
> for example: data.ix['top1',:,'bot1'] will have keys only at the middle
> levels.
>
> I have a quick look of pytables document but not very sure, should I use
> pytables for this purpose?
>

Hello Chao,

If you are only ever going to use this data structure in memory, you
shouldn't use pytables.  If you are going to persist this information to
disk than pytables is a great choice!  Every dictionary will become a group
and every leaf data structure will become an Array or a Table.

Be Well
Anthony


>
> thanks a lot for any idea.
>
> cheers,
>
> Chao
>
> --
>
> ***
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>
> 
>
>
> --
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] should I use pytables?

2013-08-07 Thread Chao YUE
Dear all,

I have a hierachical nested python dictionaries with the end of the branch
as either pandas dataframe, or np.ndarray or list or plain scalars.

let's say the different levels of keys are:

1st level: ['top1', 'top2', 'top3']
2nd level: ['mid1','mid2','mid3']
3rd level: ['bot1','bot2','bot3','bot4']

I think I am looking for some data strucuture that allow easy retrieving of
the data at different levels as dictionaries (I cannot think out something
better yet).

for example: data.ix['top1',:,'bot1'] will have keys only at the middle
levels.

I have a quick look of pytables document but not very sure, should I use
pytables for this purpose?

thanks a lot for any idea.

cheers,

Chao

-- 
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] searching for group names

2013-08-07 Thread Gabriel J.L. Beckers
Hi,

I don't know if this is related in any way to Gergo's problem, but I  
have slow responses when querying which children a group contains, if  
that group contains big leafs. I am using pytables 2.5 and hdf5 1.8.9  
on linux 64 bit.

Specifically, I found that using the _g_get_objinfo method (which is  
used by other methods that I use) is slow when used on a large leaf.  
The slowness is proportional to the size of the leaf. It is almost as  
if some process is actually reading the data instead of just info on  
the type of data. I am noticing this because my data is on an external  
usb3 disk. To give you an idea: that method takes almost 80 seconds to  
return the string 'Leaf' when used on a 5 Gb EArray. That should  
roughly correspond to reading the complete disk-based array. The info  
is cached somehow, because if I run the method a second time in the  
same python session it is very fast.

If I copy my hdf5 file to my SSD disk, things are much faster, but  
running the method still takes 2 seconds or so on a 5 Gb leaf.

Is this expected behavior and should I just avoid this method in my  
applications, or is something wrong?

Best, Gabriel

Anthony Scopatz  schreef:

> On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő  wrote:
>
>> Hello,
>>
>>
>> We develop a measurement evaluation tool, and we'd like to use
>> pytables/hdf5 as a middle layer for signal accessing.
>>
>> We have to deal with the silly structure of the recorder device
>> measurement format.
>>
>>
>>
>> The signals can be accessed via two identifiers:
>>
>> * device name: -> message>--
>>
>> * signal name
>>
>>
>>
>> The first identifier says the source information of the signal, which
>> can be quite long.
>>
>> Therefore I grouped the device name into two layers:
>>
>> /
>>
>> /...
>>
>> /
>>
>>
>>
>> So if you have the same message from two channels, than you will get
>> /foo-device-name
>>
>> /channel-1
>>
>> /bar
>>
>> /baz
>>
>> /channel-2
>>
>> /bar
>>
>> /baz
>>
>>
>>
>> Besides signal loading, we have to search for signal name as fast as
>> possible, and return with the shortest unique device name part and the
>> signal name.
>>
>> Using the structure above, iterating over the group names is quite
>> slow. So I build up a table from device and signal name.
>>
>> As far as I know, the pytables query does not support string searching
>> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
>> to a pure python loop which is slow again.
>>
>> Therefore I build up a python dictionary from the table, which provide
>> fast iteration against the table, but the init time increased from 100
>> ms to 3-4 sec (we have more than 40 000 signals).
>>
>>
>>
>> Do you have any advice how to search for group names in hdf5 with
>> pytables in an efficient way?
>>
>
> Hi grego,
>
> Searching through group names, like accessing all HDF5 metadata, is slow.
>  For group names this is because rather than searching through a list you
> are traversing a B-tree, IIRC.  So you have to use the couple of tricks
> that you used: 1) have another Table / Array of all table names, 2) read
> this in once to a native Python data structure (dict here).
>
> However, 4 sec to read in this table seems excessive for data of this size.
>  You are probably not reading this in properly.  You should be using:
>
> raw_grps = f.root.grp_names[:]
>
> or similar.
>
> Maybe other people have some other ideas.
>
> Be Well
> Anthony
>
>
>>
>> ps: I would be most happy with a glob interface.
>>
>>
>>
>> thanks for your advices in advance,
>>
>> gergo
>>
>>
>> --
>> Get your SQL database under version control now!
>> Version control is standard for application code, but databases havent
>> caught up. So what steps can you take to put your SQL databases under
>> version control? Why should you start doing it? Read more to find out.
>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>




--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users