Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-23 Thread Ivan Vilata i Balaguer
Vincent Nijs (el 2007-07-22 a les 10:21:18 -0500) va dir::

> [...]
> I would assume the NULL's could be treated as missing values (?) Don't know
> about the different types in one column however.

Maybe a masked array would do the trick, with NULL values masked out.

::

Ivan Vilata i Balaguer   >qo<   http://www.carabos.com/
   Cárabos Coop. V.  V  V   Enjoy Data
  ""


signature.asc
Description: Digital signature
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-22 Thread Vincent Nijs
FYI

I asked a question about the load and save speed of recarray's using pickle
vs pysqlite on the pysqlite list and got the response linked below. Doesn't
look like sqlite can do much better than what I found.

http://lists.initd.org/pipermail/pysqlite/2007-July/001085.html

I also passed on Francesc's idea to use numpy containers in relational
database wrappers such as pysqlite. This is apparently not possible since
"in a "relational database you don't know the type of the values in advance.
Some values might be NULL" and "and you might even have different types for
the same column"

http://lists.initd.org/pipermail/pysqlite/2007-July/001087.html

I would assume the NULL's could be treated as missing values (?) Don't know
about the different types in one column however.

Vincent

On 7/20/07 10:53 AM, "Francesc Altet" <[EMAIL PROTECTED]> wrote:

> Vincent,
> 
> A Divendres 20 Juliol 2007 15:35, Vincent Nijs escrigué:
>> Still curious however ... does no one on this list use (and like) sqlite?
> 
> First of all, while I'm not a heavy user of relational databases, I've used
> them as references for benchmarking purposes.  Hence, based on my own
> benchmarking experience, I'd say that, for writing, relational databases do
> take a lot of safety measures to ensure that all the data that is written to
> the disk is safe and that the data relationships don't get broken, and that
> takes times (a lot of time, in fact).   I'm not sure about whether some of
> these safety measures can be relaxed, but even though some relational
> databases would allow this, my feel (beware, I can be wrong) is that you
> won't be able to reach cPickle/PyTables speed (cPickle/PyTables are not
> observing security measures in that regard because they are not thought for
> these tasks).
> 
> In this sense, the best writing  speed that I was able to achieve with
> Postgres (I don't know whether sqlite support this) is by simulating that
> your data comes from a file stream and using the "cursor.copy_from()" method.
> Using this approach I was able to accelerate a 10x (if I remember well) the
> injecting speed, but even with this, PyTables can be another 10x faster. You
> can see an exemple of usage in the Postgres backend [1] used for doing the
> benchmarks for comparing PyTables and Postgres speeds.
> 
> Regarding reading speed, my diggins [2] seems to indicate that the bottleneck
> here is not related with safety, but with the need of the relational
> databases pythonic APIs of wrapping *every* element retrieved out of the
> database with a Python container (int, float, string...).  On the contrary,
> PyTables does take advantage of creating an empty recarray as the container
> to keep all the retrieved data, and that's very fast compared with the former
> approach.  To somewhat quantify this effect in function of the size of the
> dataset retrieved, you can see the figure 14 of [3] (as you can see, the
> larger the dataset retrieved, the larger the difference in terms of speed).
> Incidentally, and as it is said there, I'm hoping that NumPy containers
> should eventually be discovered by relational database wrappers makers, so
> these wrapping times would be removed completely, but I'm currently not aware
> of any package taking this approach.
> 
> [1] http://www.pytables.org/trac/browser/trunk/bench/postgres_backend.py
> [2] http://thread.gmane.org/gmane.comp.python.numeric.general/9704
> [3] http://www.carabos.com/docs/OPSI-indexes.pdf
> 
> Cheers,


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Francesc Altet
A Divendres 20 Juliol 2007 20:16, Christopher Barker escrigué:
> Another small note:
>
> I'm pretty sure sqlite stores everything as strings. This just plain has
> to be slower than storing the raw binary representation (and may mean
> for slight differences in fp values on the round-trip). HDF is designed
> for this sort of thing, sqlite is not.

Yeah, that was the case with sqlite 2.  However, starting with sqlite 3, 
developers provided the ability to store integer and real numbers in a more 
compact format [1].  Sqlite 3 is the version included in Python 2.5 (the 
python version that Vincent was benchmarking), so this shouldn't make a big 
difference compared with other relational databases.

[1] http://www.sqlite.org/datatype3.html

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Christopher Barker
Another small note:

I'm pretty sure sqlite stores everything as strings. This just plain has 
to be slower than storing the raw binary representation (and may mean 
for slight differences in fp values on the round-trip). HDF is designed 
for this sort of thing, sqlite is not.


-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Francesc Altet
Vincent,

A Divendres 20 Juliol 2007 15:35, Vincent Nijs escrigué:
> Still curious however ... does no one on this list use (and like) sqlite?

First of all, while I'm not a heavy user of relational databases, I've used 
them as references for benchmarking purposes.  Hence, based on my own 
benchmarking experience, I'd say that, for writing, relational databases do 
take a lot of safety measures to ensure that all the data that is written to 
the disk is safe and that the data relationships don't get broken, and that 
takes times (a lot of time, in fact).   I'm not sure about whether some of 
these safety measures can be relaxed, but even though some relational 
databases would allow this, my feel (beware, I can be wrong) is that you 
won't be able to reach cPickle/PyTables speed (cPickle/PyTables are not 
observing security measures in that regard because they are not thought for 
these tasks).

In this sense, the best writing  speed that I was able to achieve with 
Postgres (I don't know whether sqlite support this) is by simulating that 
your data comes from a file stream and using the "cursor.copy_from()" method.  
Using this approach I was able to accelerate a 10x (if I remember well) the 
injecting speed, but even with this, PyTables can be another 10x faster. You 
can see an exemple of usage in the Postgres backend [1] used for doing the 
benchmarks for comparing PyTables and Postgres speeds.

Regarding reading speed, my diggins [2] seems to indicate that the bottleneck 
here is not related with safety, but with the need of the relational 
databases pythonic APIs of wrapping *every* element retrieved out of the 
database with a Python container (int, float, string...).  On the contrary, 
PyTables does take advantage of creating an empty recarray as the container 
to keep all the retrieved data, and that's very fast compared with the former 
approach.  To somewhat quantify this effect in function of the size of the 
dataset retrieved, you can see the figure 14 of [3] (as you can see, the 
larger the dataset retrieved, the larger the difference in terms of speed).  
Incidentally, and as it is said there, I'm hoping that NumPy containers 
should eventually be discovered by relational database wrappers makers, so 
these wrapping times would be removed completely, but I'm currently not aware 
of any package taking this approach.

[1] http://www.pytables.org/trac/browser/trunk/bench/postgres_backend.py
[2] http://thread.gmane.org/gmane.comp.python.numeric.general/9704
[3] http://www.carabos.com/docs/OPSI-indexes.pdf

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Gael Varoquaux
On Fri, Jul 20, 2007 at 08:35:51AM -0500, Vincent Nijs wrote:
> Sounds very interesting! Would you mind sharing an example (with code if
> possible) of how you organize your experimental data in pytables. I have
> been thinking about how I might organize my data in pytables and would luv
> to hear how an experienced user does that.

I can show you the processing code. The experiment I have close to me is
run by Matlab, the one that is fully controlled by Python is a continent
away.

Actually, I am really lazy, so I am just going to copy brutally the IO
module.

Something that can be interesting is that the data is saved by the
expirement control framework on a computer (called Krubcontrol), this
data can then be retrieve using the "fetch_files" Python command, that
puts it on the server and logs it into a data base like hash table. When
we want to retrieve the data we have a special object krubdata, which
uses some fancy indexing to retrieve by data, or specifying the keywords.

I am sorry I am not providing the code that is writing the hdf5 files, it
is an incredible useless mess, trust me. I would be able to factor out
the output code out of the 5K matlab lines. Hopefuly you'll be able to
get an idea of the structure of the hdf5 files by looking at the code
that does the loading. I haven't worked with this data for a while, so I
can't tell you

Some of the Python code might be useful to others, especially the hashing
and retrieving part. The reason why I didn't use a relational DB is that
I simply don't trust them enough for my precious data.

Gaël
"""
Krub.load

Routines to load the data saved by the experiment and build useful
structures out of it.

Author: Gael Varoquaux <[EMAIL PROTECTED]>
Copyright: Laboratoire Charles Fabry de l'Institut d'Optique
License: BSD-like

"""
# Avoid division problems
from __future__ import division

# To load hdf5
import tables
# Do not display any warnings (FIXME: this is too strict)
tables.warnings.filterwarnings('ignore')

# regular expressions
import re

import os, sys, shutil
import datetime

# Module for object persistence
import shelve

# provide globbing
from glob import glob

from numpy import array

# FIXME: This will pose problem when pytables transit to numpy.
from numarray.strings import CharArray

# FIXME: This is to much hardcoded
data_root = "/home/manip/data"
db_file_name = "/home/manip/analysis/krubDB.db"

def load_h5(file_name):
""" Loads an hdf5 file and returns a dict with the hdf5 data in it.
"""
file = tables.openFile(file_name)
out_dict = {}
for key, value in file.leaves.iteritems():
if isinstance(value, tables.UnImplemented):
continue
try:
value = value.read()
try:
if isinstance(value, CharArray):
value = value.tolist()
except Exception, inst:
print "Couldn't convert %s to a list" % key
print inst
if len(value) == 1:
value = value[0]
out_dict[key[1:]] = value
except Exception, inst:
print "couldn't load %s" % key
print inst
file.close()
return(out_dict)

def load_Krub(file_name):
""" Loads a file created by cameraview and returns a dict with the
data restructured in a more pleasant way. 
"""
data = load_h5(file_name)
# Store the params in a dict
try:
params = {}
for name, value in zip(data['SCparamsnames'],
 data['SCparams']):
params[name] = value
data.update(params)
data['params'] = params
data.pop('SCparams')
data.pop('SCparamsnames')
except  Exception, inst:
print "couldn't convert params to a dict: "
print inst
return data

def load_seq(file_list):
""" Loads a sequence of hdf5 files created by cameraview and returns
a list of dicts with the data.
"""
return [ load_Krub(file_name) for file_name in file_list ]

def build_param_table(file_list):
""" Scans the given list of files and returns a dictionary of 
dictionaries discribing the files, and the experimental parameters.
"""
out_dict = {}
for file_name in file_list:
data = load_Krub(file_name)
if 'params' in data:
params = data['params']
else:
params = {}
params['filename'] = file_name
if 'sequencename' in data: params['sequencename'] = data['sequencename']
if 'fitfunction' in data: params['fitfunction'] = data['fitfunction']
if 'loopposition' in data: params['loopposition'] = data['loopposition']
if 'roi' in data: params['roi'] = data['roi']
# Check that the filename has the timestamp
if re.match(r".*\d\d_\d\d_\d\d", file_name[:-3]):
params['time'] = int( file_name[-11:-9] + 
file_name[-8:-6] +
   

Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Vincent Nijs
Thanks Francesc!

That does work much better:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.0
HDF5 version:  1.6.5
NumPy version: 1.0.4.dev3852
Zlib version:  1.2.3
BZIP2 version: 1.0.2 (30-Dec-2001)
Python version:2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)]
Platform:  darwin-Power Macintosh
Byte-ordering: big
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Test saving recarray using cPickle: 1.620880 sec/pass
Test saving recarray with pytables: 2.074591 sec/pass
Test saving recarray with pytables (with zlib): 14.320498 sec/pass


Test loading recarray using cPickle: 1.023015 sec/pass
Test loading recarray with pytables: 0.882411 sec/pass
Test loading recarray with pytables (with zlib): 3.692698 sec/pass


On 7/20/07 6:17 AM, "Francesc Altet" <[EMAIL PROTECTED]> wrote:

> A Divendres 20 Juliol 2007 04:42, Vincent Nijs escrigué:
>> I am interesting in using sqlite (or pytables) to store data for scientific
>> research. I wrote the attached test program to save and load a simulated
>> 11x500,000 recarray. Average save and load times are given below (timeit
>> with 20 repetitions). The save time for sqlite is not really fair because I
>> have to delete the data table each time before I create the new one. It is
>> still pretty slow in comparison. Loading the recarray from sqlite is
>> significantly slower than pytables or cPickle. I am hoping there may be
>> more efficient ways to save and load recarray¹s from/to sqlite than what I
>> am now doing. Note that I infer the variable names and types from the data
>> rather than specifying them manually.
>> 
>> I¹d luv to hear from people using sqlite, pytables, and cPickle about their
>> experiences.
>> 
>> saving recarray with cPickle:   1.448568 sec/pass
>> saving recarray with pytable:  3.437228 sec/pass
>> saving recarray with sqlite: 193.286204 sec/pass
>> 
>> loading recarray using cPickle:0.471365 sec/pass
>> loading recarray with pytable: 0.692838 sec/pass
>> loading recarray with sqlite:15.977018 sec/pass
> 
> For a more fair comparison, and for large amounts of data, you should inform
> PyTables about the expected number of rows (see [1]) that you will end
> feeding into the tables so that it can choose the best chunksize for I/O
> purposes.
> 
> I've redone the benchmarks (the new script is attached) with
> this 'optimization' on and here are my numbers:
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> PyTables version:  2.0
> HDF5 version:  1.6.5
> NumPy version: 1.0.3
> Zlib version:  1.2.3
> LZO version:   2.01 (Jun 27 2005)
> Python version:2.5 (r25:51908, Nov  3 2006, 12:01:01)
> [GCC 4.0.2 20050901 (prerelease) (SUSE Linux)]
> Platform:  linux2-x86_64
> Byte-ordering: little
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test saving recarray using cPickle: 0.197113 sec/pass
> Test saving recarray with pytables: 0.234442 sec/pass
> Test saving recarray with pytables (with zlib): 1.973649 sec/pass
> Test saving recarray with pytables (with lzo): 0.925558 sec/pass
> 
> Test loading recarray using cPickle: 0.151379 sec/pass
> Test loading recarray with pytables: 0.165399 sec/pass
> Test loading recarray with pytables (with zlib): 0.553251 sec/pass
> Test loading recarray with pytables (with lzo): 0.264417 sec/pass
> 
> As you can see, the differences between raw cPickle and PyTables are much less
> than not informing about the total number of rows.  In fact, an automatic
> optimization can easily be done in PyTables so that when the user is passing
> a recarray, the total length of the recarray would be compared with the
> default number of expected rows (currently 1), and if the former is
> larger, then the length of the recarray should be chosen instead.
> 
> I also have added the times when using compression just in case you are
> interested using it.  Here are the final file sizes:
> 
> $ ls -sh data
> total 132M
> 24M data-lzo.h5  43M data-None.h5  43M data.pickle  25M data-zlib.h5
> 
> Of course, this is using completely random data, but with real data the
> compression levels are expected to be higher than this.
> 
> [1] http://www.pytables.org/docs/manual/ch05.html#expectedRowsOptim
> 
> Cheers,

-- 
Vincent R. Nijs
Assistant Professor of Marketing
Kellogg School of Management, Northwestern University
2001 Sheridan Road, Evanston, IL 60208-2001
Phone: +1-847-491-4574 Fax: +1-847-491-2498
E-mail: [EMAIL PROTECTED]
Skype: vincentnijs



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Timothy Hochberg

On 7/20/07, Vincent Nijs <[EMAIL PROTECTED]> wrote:


Gael,

Sounds very interesting! Would you mind sharing an example (with code if
possible) of how you organize your experimental data in pytables. I have
been thinking about how I might organize my data in pytables and would luv
to hear how an experienced user does that.

Given the speed differences it looks like pytables is going to be a better
solution for my needs.

Still curious however ... does no one on this list use (and like) sqlite?

Could anyone suggest any other list where I might find users of python and
sqlite (and numpy)?



You could try the db-sig. You can get to the archives, and I imagine
subscribe to it, from:

http://www.python.org/community/sigs/current/

I don't know if that'll be helpful for you, but I imagine that they know
something about python + sqlllite.




--
.  __
.   |-\
.
.  [EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Vincent Nijs
Gael,

Sounds very interesting! Would you mind sharing an example (with code if
possible) of how you organize your experimental data in pytables. I have
been thinking about how I might organize my data in pytables and would luv
to hear how an experienced user does that.

Given the speed differences it looks like pytables is going to be a better
solution for my needs.

Still curious however ... does no one on this list use (and like) sqlite?

Could anyone suggest any other list where I might find users of python and
sqlite (and numpy)?

Thanks,

Vincent


On 7/20/07 1:16 AM, "Gael Varoquaux" <[EMAIL PROTECTED]> wrote:

> On Thu, Jul 19, 2007 at 09:42:42PM -0500, Vincent Nijs wrote:
>>I'd luv to hear from people using sqlite, pytables, and cPickle about
>>their experiences.
> 
> I was about to point you to this discussion:
> http://projects.scipy.org/pipermail/scipy-user/2007-April/011724.html
> 
> but I see that you participated in it.
> 
> I store data from each of my experimental run with pytables. What I like
> about it is the hierarchical organization of the data which allows me to
> save a complete description of the experiment, with strings, and
> extensible data structures. Another thing I like is that I can load this
> in Matlab (I can provide enhanced script for hdf5, if somebody wants
> them), and I think it is possible to read hdf5 in Origin. I don't use
> these software, but some colleagues do.
> 
> So I think the choices between pytables and cPickle boils down to whether
> you want to share the data with other software than Python or not.
> 
> Gaël
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> 


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Francesc Altet
A Divendres 20 Juliol 2007 04:42, Vincent Nijs escrigué:
> I am interesting in using sqlite (or pytables) to store data for scientific
> research. I wrote the attached test program to save and load a simulated
> 11x500,000 recarray. Average save and load times are given below (timeit
> with 20 repetitions). The save time for sqlite is not really fair because I
> have to delete the data table each time before I create the new one. It is
> still pretty slow in comparison. Loading the recarray from sqlite is
> significantly slower than pytables or cPickle. I am hoping there may be
> more efficient ways to save and load recarray¹s from/to sqlite than what I
> am now doing. Note that I infer the variable names and types from the data
> rather than specifying them manually.
>
> I¹d luv to hear from people using sqlite, pytables, and cPickle about their
> experiences.
>
> saving recarray with cPickle:   1.448568 sec/pass
> saving recarray with pytable:  3.437228 sec/pass
> saving recarray with sqlite: 193.286204 sec/pass
>
> loading recarray using cPickle:0.471365 sec/pass
> loading recarray with pytable: 0.692838 sec/pass
> loading recarray with sqlite:15.977018 sec/pass

For a more fair comparison, and for large amounts of data, you should inform 
PyTables about the expected number of rows (see [1]) that you will end 
feeding into the tables so that it can choose the best chunksize for I/O 
purposes.

I've redone the benchmarks (the new script is attached) with 
this 'optimization' on and here are my numbers:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.0
HDF5 version:  1.6.5
NumPy version: 1.0.3
Zlib version:  1.2.3
LZO version:   2.01 (Jun 27 2005)
Python version:2.5 (r25:51908, Nov  3 2006, 12:01:01)
[GCC 4.0.2 20050901 (prerelease) (SUSE Linux)]
Platform:  linux2-x86_64
Byte-ordering: little
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Test saving recarray using cPickle: 0.197113 sec/pass
Test saving recarray with pytables: 0.234442 sec/pass
Test saving recarray with pytables (with zlib): 1.973649 sec/pass
Test saving recarray with pytables (with lzo): 0.925558 sec/pass

Test loading recarray using cPickle: 0.151379 sec/pass
Test loading recarray with pytables: 0.165399 sec/pass
Test loading recarray with pytables (with zlib): 0.553251 sec/pass
Test loading recarray with pytables (with lzo): 0.264417 sec/pass

As you can see, the differences between raw cPickle and PyTables are much less 
than not informing about the total number of rows.  In fact, an automatic 
optimization can easily be done in PyTables so that when the user is passing 
a recarray, the total length of the recarray would be compared with the 
default number of expected rows (currently 1), and if the former is 
larger, then the length of the recarray should be chosen instead.

I also have added the times when using compression just in case you are 
interested using it.  Here are the final file sizes:

$ ls -sh data
total 132M
24M data-lzo.h5  43M data-None.h5  43M data.pickle  25M data-zlib.h5

Of course, this is using completely random data, but with real data the 
compression levels are expected to be higher than this.

[1] http://www.pytables.org/docs/manual/ch05.html#expectedRowsOptim

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"


load_tables_test.py
Description: application/python
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Ivan Vilata i Balaguer
Gael Varoquaux (el 2007-07-20 a les 11:24:34 +0200) va dir::

> I new I really should put these things on line, I have just been wanting
> to iron them a bit, but it has been almost two year since I have touched
> these, so ...
> 
> http://scipy.org/Cookbook/hdf5_in_Matlab

Wow, that looks really sweet and simple, useful code.  Great!

::

Ivan Vilata i Balaguer   >qo<   http://www.carabos.com/
   Cárabos Coop. V.  V  V   Enjoy Data
  ""


signature.asc
Description: Digital signature
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Gael Varoquaux
On Fri, Jul 20, 2007 at 01:59:13AM -0700, Andrew Straw wrote:
> I want that Matlab script! 

I new I really should put these things on line, I have just been wanting
to iron them a bit, but it has been almost two year since I have touched
these, so ...

http://scipy.org/Cookbook/hdf5_in_Matlab

Feel free to improve them, and to write similar scripts in Python.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-20 Thread Andrew Straw
Gael Varoquaux wrote:
> On Thu, Jul 19, 2007 at 09:42:42PM -0500, Vincent Nijs wrote:
>>I'd luv to hear from people using sqlite, pytables, and cPickle about
>>their experiences.
> 
> I was about to point you to this discussion:
> http://projects.scipy.org/pipermail/scipy-user/2007-April/011724.html
> 
> but I see that you participated in it.
> 
> I store data from each of my experimental run with pytables. What I like
> about it is the hierarchical organization of the data which allows me to
> save a complete description of the experiment, with strings, and
> extensible data structures. Another thing I like is that I can load this
> in Matlab (I can provide enhanced script for hdf5, if somebody wants
> them), and I think it is possible to read hdf5 in Origin. I don't use
> these software, but some colleagues do.

I want that Matlab script! I have colleagues with whom the least common 
denominator is currently .mat files. I'd be much happier if it was hdf5 
files. Can you post it on the scipy wiki cookbook? (Or the pytables wiki?)

Cheers!
Andrew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-19 Thread Gael Varoquaux
On Thu, Jul 19, 2007 at 09:42:42PM -0500, Vincent Nijs wrote:
>I'd luv to hear from people using sqlite, pytables, and cPickle about
>their experiences.

I was about to point you to this discussion:
http://projects.scipy.org/pipermail/scipy-user/2007-April/011724.html

but I see that you participated in it.

I store data from each of my experimental run with pytables. What I like
about it is the hierarchical organization of the data which allows me to
save a complete description of the experiment, with strings, and
extensible data structures. Another thing I like is that I can load this
in Matlab (I can provide enhanced script for hdf5, if somebody wants
them), and I think it is possible to read hdf5 in Origin. I don't use
these software, but some colleagues do.

So I think the choices between pytables and cPickle boils down to whether
you want to share the data with other software than Python or not.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's

2007-07-19 Thread Vincent Nijs
I am interesting in using sqlite (or pytables) to store data for scientific
research. I wrote the attached test program to save and load a simulated
11x500,000 recarray. Average save and load times are given below (timeit
with 20 repetitions). The save time for sqlite is not really fair because I
have to delete the data table each time before I create the new one. It is
still pretty slow in comparison. Loading the recarray from sqlite is
significantly slower than pytables or cPickle. I am hoping there may be more
efficient ways to save and load recarray¹s from/to sqlite than what I am now
doing. Note that I infer the variable names and types from the data rather
than specifying them manually.

I¹d luv to hear from people using sqlite, pytables, and cPickle about their
experiences. 

saving recarray with cPickle:   1.448568 sec/pass
saving recarray with pytable:  3.437228 sec/pass
saving recarray with sqlite: 193.286204 sec/pass

loading recarray using cPickle:0.471365 sec/pass
loading recarray with pytable: 0.692838 sec/pass
loading recarray with sqlite:15.977018 sec/pass

Best,

Vincent


load_tables_test.py
Description: Binary data
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion