Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Vincent Nijs (el 2007-07-22 a les 10:21:18 -0500) va dir:: > [...] > I would assume the NULL's could be treated as missing values (?) Don't know > about the different types in one column however. Maybe a masked array would do the trick, with NULL values masked out. :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data "" signature.asc Description: Digital signature ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
FYI I asked a question about the load and save speed of recarray's using pickle vs pysqlite on the pysqlite list and got the response linked below. Doesn't look like sqlite can do much better than what I found. http://lists.initd.org/pipermail/pysqlite/2007-July/001085.html I also passed on Francesc's idea to use numpy containers in relational database wrappers such as pysqlite. This is apparently not possible since "in a "relational database you don't know the type of the values in advance. Some values might be NULL" and "and you might even have different types for the same column" http://lists.initd.org/pipermail/pysqlite/2007-July/001087.html I would assume the NULL's could be treated as missing values (?) Don't know about the different types in one column however. Vincent On 7/20/07 10:53 AM, "Francesc Altet" <[EMAIL PROTECTED]> wrote: > Vincent, > > A Divendres 20 Juliol 2007 15:35, Vincent Nijs escrigué: >> Still curious however ... does no one on this list use (and like) sqlite? > > First of all, while I'm not a heavy user of relational databases, I've used > them as references for benchmarking purposes. Hence, based on my own > benchmarking experience, I'd say that, for writing, relational databases do > take a lot of safety measures to ensure that all the data that is written to > the disk is safe and that the data relationships don't get broken, and that > takes times (a lot of time, in fact). I'm not sure about whether some of > these safety measures can be relaxed, but even though some relational > databases would allow this, my feel (beware, I can be wrong) is that you > won't be able to reach cPickle/PyTables speed (cPickle/PyTables are not > observing security measures in that regard because they are not thought for > these tasks). > > In this sense, the best writing speed that I was able to achieve with > Postgres (I don't know whether sqlite support this) is by simulating that > your data comes from a file stream and using the "cursor.copy_from()" method. > Using this approach I was able to accelerate a 10x (if I remember well) the > injecting speed, but even with this, PyTables can be another 10x faster. You > can see an exemple of usage in the Postgres backend [1] used for doing the > benchmarks for comparing PyTables and Postgres speeds. > > Regarding reading speed, my diggins [2] seems to indicate that the bottleneck > here is not related with safety, but with the need of the relational > databases pythonic APIs of wrapping *every* element retrieved out of the > database with a Python container (int, float, string...). On the contrary, > PyTables does take advantage of creating an empty recarray as the container > to keep all the retrieved data, and that's very fast compared with the former > approach. To somewhat quantify this effect in function of the size of the > dataset retrieved, you can see the figure 14 of [3] (as you can see, the > larger the dataset retrieved, the larger the difference in terms of speed). > Incidentally, and as it is said there, I'm hoping that NumPy containers > should eventually be discovered by relational database wrappers makers, so > these wrapping times would be removed completely, but I'm currently not aware > of any package taking this approach. > > [1] http://www.pytables.org/trac/browser/trunk/bench/postgres_backend.py > [2] http://thread.gmane.org/gmane.comp.python.numeric.general/9704 > [3] http://www.carabos.com/docs/OPSI-indexes.pdf > > Cheers, ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
A Divendres 20 Juliol 2007 20:16, Christopher Barker escrigué: > Another small note: > > I'm pretty sure sqlite stores everything as strings. This just plain has > to be slower than storing the raw binary representation (and may mean > for slight differences in fp values on the round-trip). HDF is designed > for this sort of thing, sqlite is not. Yeah, that was the case with sqlite 2. However, starting with sqlite 3, developers provided the ability to store integer and real numbers in a more compact format [1]. Sqlite 3 is the version included in Python 2.5 (the python version that Vincent was benchmarking), so this shouldn't make a big difference compared with other relational databases. [1] http://www.sqlite.org/datatype3.html Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Another small note: I'm pretty sure sqlite stores everything as strings. This just plain has to be slower than storing the raw binary representation (and may mean for slight differences in fp values on the round-trip). HDF is designed for this sort of thing, sqlite is not. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Vincent, A Divendres 20 Juliol 2007 15:35, Vincent Nijs escrigué: > Still curious however ... does no one on this list use (and like) sqlite? First of all, while I'm not a heavy user of relational databases, I've used them as references for benchmarking purposes. Hence, based on my own benchmarking experience, I'd say that, for writing, relational databases do take a lot of safety measures to ensure that all the data that is written to the disk is safe and that the data relationships don't get broken, and that takes times (a lot of time, in fact). I'm not sure about whether some of these safety measures can be relaxed, but even though some relational databases would allow this, my feel (beware, I can be wrong) is that you won't be able to reach cPickle/PyTables speed (cPickle/PyTables are not observing security measures in that regard because they are not thought for these tasks). In this sense, the best writing speed that I was able to achieve with Postgres (I don't know whether sqlite support this) is by simulating that your data comes from a file stream and using the "cursor.copy_from()" method. Using this approach I was able to accelerate a 10x (if I remember well) the injecting speed, but even with this, PyTables can be another 10x faster. You can see an exemple of usage in the Postgres backend [1] used for doing the benchmarks for comparing PyTables and Postgres speeds. Regarding reading speed, my diggins [2] seems to indicate that the bottleneck here is not related with safety, but with the need of the relational databases pythonic APIs of wrapping *every* element retrieved out of the database with a Python container (int, float, string...). On the contrary, PyTables does take advantage of creating an empty recarray as the container to keep all the retrieved data, and that's very fast compared with the former approach. To somewhat quantify this effect in function of the size of the dataset retrieved, you can see the figure 14 of [3] (as you can see, the larger the dataset retrieved, the larger the difference in terms of speed). Incidentally, and as it is said there, I'm hoping that NumPy containers should eventually be discovered by relational database wrappers makers, so these wrapping times would be removed completely, but I'm currently not aware of any package taking this approach. [1] http://www.pytables.org/trac/browser/trunk/bench/postgres_backend.py [2] http://thread.gmane.org/gmane.comp.python.numeric.general/9704 [3] http://www.carabos.com/docs/OPSI-indexes.pdf Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
On Fri, Jul 20, 2007 at 08:35:51AM -0500, Vincent Nijs wrote: > Sounds very interesting! Would you mind sharing an example (with code if > possible) of how you organize your experimental data in pytables. I have > been thinking about how I might organize my data in pytables and would luv > to hear how an experienced user does that. I can show you the processing code. The experiment I have close to me is run by Matlab, the one that is fully controlled by Python is a continent away. Actually, I am really lazy, so I am just going to copy brutally the IO module. Something that can be interesting is that the data is saved by the expirement control framework on a computer (called Krubcontrol), this data can then be retrieve using the "fetch_files" Python command, that puts it on the server and logs it into a data base like hash table. When we want to retrieve the data we have a special object krubdata, which uses some fancy indexing to retrieve by data, or specifying the keywords. I am sorry I am not providing the code that is writing the hdf5 files, it is an incredible useless mess, trust me. I would be able to factor out the output code out of the 5K matlab lines. Hopefuly you'll be able to get an idea of the structure of the hdf5 files by looking at the code that does the loading. I haven't worked with this data for a while, so I can't tell you Some of the Python code might be useful to others, especially the hashing and retrieving part. The reason why I didn't use a relational DB is that I simply don't trust them enough for my precious data. Gaël """ Krub.load Routines to load the data saved by the experiment and build useful structures out of it. Author: Gael Varoquaux <[EMAIL PROTECTED]> Copyright: Laboratoire Charles Fabry de l'Institut d'Optique License: BSD-like """ # Avoid division problems from __future__ import division # To load hdf5 import tables # Do not display any warnings (FIXME: this is too strict) tables.warnings.filterwarnings('ignore') # regular expressions import re import os, sys, shutil import datetime # Module for object persistence import shelve # provide globbing from glob import glob from numpy import array # FIXME: This will pose problem when pytables transit to numpy. from numarray.strings import CharArray # FIXME: This is to much hardcoded data_root = "/home/manip/data" db_file_name = "/home/manip/analysis/krubDB.db" def load_h5(file_name): """ Loads an hdf5 file and returns a dict with the hdf5 data in it. """ file = tables.openFile(file_name) out_dict = {} for key, value in file.leaves.iteritems(): if isinstance(value, tables.UnImplemented): continue try: value = value.read() try: if isinstance(value, CharArray): value = value.tolist() except Exception, inst: print "Couldn't convert %s to a list" % key print inst if len(value) == 1: value = value[0] out_dict[key[1:]] = value except Exception, inst: print "couldn't load %s" % key print inst file.close() return(out_dict) def load_Krub(file_name): """ Loads a file created by cameraview and returns a dict with the data restructured in a more pleasant way. """ data = load_h5(file_name) # Store the params in a dict try: params = {} for name, value in zip(data['SCparamsnames'], data['SCparams']): params[name] = value data.update(params) data['params'] = params data.pop('SCparams') data.pop('SCparamsnames') except Exception, inst: print "couldn't convert params to a dict: " print inst return data def load_seq(file_list): """ Loads a sequence of hdf5 files created by cameraview and returns a list of dicts with the data. """ return [ load_Krub(file_name) for file_name in file_list ] def build_param_table(file_list): """ Scans the given list of files and returns a dictionary of dictionaries discribing the files, and the experimental parameters. """ out_dict = {} for file_name in file_list: data = load_Krub(file_name) if 'params' in data: params = data['params'] else: params = {} params['filename'] = file_name if 'sequencename' in data: params['sequencename'] = data['sequencename'] if 'fitfunction' in data: params['fitfunction'] = data['fitfunction'] if 'loopposition' in data: params['loopposition'] = data['loopposition'] if 'roi' in data: params['roi'] = data['roi'] # Check that the filename has the timestamp if re.match(r".*\d\d_\d\d_\d\d", file_name[:-3]): params['time'] = int( file_name[-11:-9] + file_name[-8:-6] +
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Thanks Francesc! That does work much better: -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.0 HDF5 version: 1.6.5 NumPy version: 1.0.4.dev3852 Zlib version: 1.2.3 BZIP2 version: 1.0.2 (30-Dec-2001) Python version:2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] Platform: darwin-Power Macintosh Byte-ordering: big -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test saving recarray using cPickle: 1.620880 sec/pass Test saving recarray with pytables: 2.074591 sec/pass Test saving recarray with pytables (with zlib): 14.320498 sec/pass Test loading recarray using cPickle: 1.023015 sec/pass Test loading recarray with pytables: 0.882411 sec/pass Test loading recarray with pytables (with zlib): 3.692698 sec/pass On 7/20/07 6:17 AM, "Francesc Altet" <[EMAIL PROTECTED]> wrote: > A Divendres 20 Juliol 2007 04:42, Vincent Nijs escrigué: >> I am interesting in using sqlite (or pytables) to store data for scientific >> research. I wrote the attached test program to save and load a simulated >> 11x500,000 recarray. Average save and load times are given below (timeit >> with 20 repetitions). The save time for sqlite is not really fair because I >> have to delete the data table each time before I create the new one. It is >> still pretty slow in comparison. Loading the recarray from sqlite is >> significantly slower than pytables or cPickle. I am hoping there may be >> more efficient ways to save and load recarray¹s from/to sqlite than what I >> am now doing. Note that I infer the variable names and types from the data >> rather than specifying them manually. >> >> I¹d luv to hear from people using sqlite, pytables, and cPickle about their >> experiences. >> >> saving recarray with cPickle: 1.448568 sec/pass >> saving recarray with pytable: 3.437228 sec/pass >> saving recarray with sqlite: 193.286204 sec/pass >> >> loading recarray using cPickle:0.471365 sec/pass >> loading recarray with pytable: 0.692838 sec/pass >> loading recarray with sqlite:15.977018 sec/pass > > For a more fair comparison, and for large amounts of data, you should inform > PyTables about the expected number of rows (see [1]) that you will end > feeding into the tables so that it can choose the best chunksize for I/O > purposes. > > I've redone the benchmarks (the new script is attached) with > this 'optimization' on and here are my numbers: > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > PyTables version: 2.0 > HDF5 version: 1.6.5 > NumPy version: 1.0.3 > Zlib version: 1.2.3 > LZO version: 2.01 (Jun 27 2005) > Python version:2.5 (r25:51908, Nov 3 2006, 12:01:01) > [GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] > Platform: linux2-x86_64 > Byte-ordering: little > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test saving recarray using cPickle: 0.197113 sec/pass > Test saving recarray with pytables: 0.234442 sec/pass > Test saving recarray with pytables (with zlib): 1.973649 sec/pass > Test saving recarray with pytables (with lzo): 0.925558 sec/pass > > Test loading recarray using cPickle: 0.151379 sec/pass > Test loading recarray with pytables: 0.165399 sec/pass > Test loading recarray with pytables (with zlib): 0.553251 sec/pass > Test loading recarray with pytables (with lzo): 0.264417 sec/pass > > As you can see, the differences between raw cPickle and PyTables are much less > than not informing about the total number of rows. In fact, an automatic > optimization can easily be done in PyTables so that when the user is passing > a recarray, the total length of the recarray would be compared with the > default number of expected rows (currently 1), and if the former is > larger, then the length of the recarray should be chosen instead. > > I also have added the times when using compression just in case you are > interested using it. Here are the final file sizes: > > $ ls -sh data > total 132M > 24M data-lzo.h5 43M data-None.h5 43M data.pickle 25M data-zlib.h5 > > Of course, this is using completely random data, but with real data the > compression levels are expected to be higher than this. > > [1] http://www.pytables.org/docs/manual/ch05.html#expectedRowsOptim > > Cheers, -- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: [EMAIL PROTECTED] Skype: vincentnijs ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
On 7/20/07, Vincent Nijs <[EMAIL PROTECTED]> wrote: Gael, Sounds very interesting! Would you mind sharing an example (with code if possible) of how you organize your experimental data in pytables. I have been thinking about how I might organize my data in pytables and would luv to hear how an experienced user does that. Given the speed differences it looks like pytables is going to be a better solution for my needs. Still curious however ... does no one on this list use (and like) sqlite? Could anyone suggest any other list where I might find users of python and sqlite (and numpy)? You could try the db-sig. You can get to the archives, and I imagine subscribe to it, from: http://www.python.org/community/sigs/current/ I don't know if that'll be helpful for you, but I imagine that they know something about python + sqlllite. -- . __ . |-\ . . [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Gael, Sounds very interesting! Would you mind sharing an example (with code if possible) of how you organize your experimental data in pytables. I have been thinking about how I might organize my data in pytables and would luv to hear how an experienced user does that. Given the speed differences it looks like pytables is going to be a better solution for my needs. Still curious however ... does no one on this list use (and like) sqlite? Could anyone suggest any other list where I might find users of python and sqlite (and numpy)? Thanks, Vincent On 7/20/07 1:16 AM, "Gael Varoquaux" <[EMAIL PROTECTED]> wrote: > On Thu, Jul 19, 2007 at 09:42:42PM -0500, Vincent Nijs wrote: >>I'd luv to hear from people using sqlite, pytables, and cPickle about >>their experiences. > > I was about to point you to this discussion: > http://projects.scipy.org/pipermail/scipy-user/2007-April/011724.html > > but I see that you participated in it. > > I store data from each of my experimental run with pytables. What I like > about it is the hierarchical organization of the data which allows me to > save a complete description of the experiment, with strings, and > extensible data structures. Another thing I like is that I can load this > in Matlab (I can provide enhanced script for hdf5, if somebody wants > them), and I think it is possible to read hdf5 in Origin. I don't use > these software, but some colleagues do. > > So I think the choices between pytables and cPickle boils down to whether > you want to share the data with other software than Python or not. > > Gaël > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
A Divendres 20 Juliol 2007 04:42, Vincent Nijs escrigué: > I am interesting in using sqlite (or pytables) to store data for scientific > research. I wrote the attached test program to save and load a simulated > 11x500,000 recarray. Average save and load times are given below (timeit > with 20 repetitions). The save time for sqlite is not really fair because I > have to delete the data table each time before I create the new one. It is > still pretty slow in comparison. Loading the recarray from sqlite is > significantly slower than pytables or cPickle. I am hoping there may be > more efficient ways to save and load recarray¹s from/to sqlite than what I > am now doing. Note that I infer the variable names and types from the data > rather than specifying them manually. > > I¹d luv to hear from people using sqlite, pytables, and cPickle about their > experiences. > > saving recarray with cPickle: 1.448568 sec/pass > saving recarray with pytable: 3.437228 sec/pass > saving recarray with sqlite: 193.286204 sec/pass > > loading recarray using cPickle:0.471365 sec/pass > loading recarray with pytable: 0.692838 sec/pass > loading recarray with sqlite:15.977018 sec/pass For a more fair comparison, and for large amounts of data, you should inform PyTables about the expected number of rows (see [1]) that you will end feeding into the tables so that it can choose the best chunksize for I/O purposes. I've redone the benchmarks (the new script is attached) with this 'optimization' on and here are my numbers: -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.0 HDF5 version: 1.6.5 NumPy version: 1.0.3 Zlib version: 1.2.3 LZO version: 2.01 (Jun 27 2005) Python version:2.5 (r25:51908, Nov 3 2006, 12:01:01) [GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] Platform: linux2-x86_64 Byte-ordering: little -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test saving recarray using cPickle: 0.197113 sec/pass Test saving recarray with pytables: 0.234442 sec/pass Test saving recarray with pytables (with zlib): 1.973649 sec/pass Test saving recarray with pytables (with lzo): 0.925558 sec/pass Test loading recarray using cPickle: 0.151379 sec/pass Test loading recarray with pytables: 0.165399 sec/pass Test loading recarray with pytables (with zlib): 0.553251 sec/pass Test loading recarray with pytables (with lzo): 0.264417 sec/pass As you can see, the differences between raw cPickle and PyTables are much less than not informing about the total number of rows. In fact, an automatic optimization can easily be done in PyTables so that when the user is passing a recarray, the total length of the recarray would be compared with the default number of expected rows (currently 1), and if the former is larger, then the length of the recarray should be chosen instead. I also have added the times when using compression just in case you are interested using it. Here are the final file sizes: $ ls -sh data total 132M 24M data-lzo.h5 43M data-None.h5 43M data.pickle 25M data-zlib.h5 Of course, this is using completely random data, but with real data the compression levels are expected to be higher than this. [1] http://www.pytables.org/docs/manual/ch05.html#expectedRowsOptim Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" load_tables_test.py Description: application/python ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Gael Varoquaux (el 2007-07-20 a les 11:24:34 +0200) va dir:: > I new I really should put these things on line, I have just been wanting > to iron them a bit, but it has been almost two year since I have touched > these, so ... > > http://scipy.org/Cookbook/hdf5_in_Matlab Wow, that looks really sweet and simple, useful code. Great! :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data "" signature.asc Description: Digital signature ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
On Fri, Jul 20, 2007 at 01:59:13AM -0700, Andrew Straw wrote: > I want that Matlab script! I new I really should put these things on line, I have just been wanting to iron them a bit, but it has been almost two year since I have touched these, so ... http://scipy.org/Cookbook/hdf5_in_Matlab Feel free to improve them, and to write similar scripts in Python. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
Gael Varoquaux wrote: > On Thu, Jul 19, 2007 at 09:42:42PM -0500, Vincent Nijs wrote: >>I'd luv to hear from people using sqlite, pytables, and cPickle about >>their experiences. > > I was about to point you to this discussion: > http://projects.scipy.org/pipermail/scipy-user/2007-April/011724.html > > but I see that you participated in it. > > I store data from each of my experimental run with pytables. What I like > about it is the hierarchical organization of the data which allows me to > save a complete description of the experiment, with strings, and > extensible data structures. Another thing I like is that I can load this > in Matlab (I can provide enhanced script for hdf5, if somebody wants > them), and I think it is possible to read hdf5 in Origin. I don't use > these software, but some colleagues do. I want that Matlab script! I have colleagues with whom the least common denominator is currently .mat files. I'd be much happier if it was hdf5 files. Can you post it on the scipy wiki cookbook? (Or the pytables wiki?) Cheers! Andrew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
On Thu, Jul 19, 2007 at 09:42:42PM -0500, Vincent Nijs wrote: >I'd luv to hear from people using sqlite, pytables, and cPickle about >their experiences. I was about to point you to this discussion: http://projects.scipy.org/pipermail/scipy-user/2007-April/011724.html but I see that you participated in it. I store data from each of my experimental run with pytables. What I like about it is the hierarchical organization of the data which allows me to save a complete description of the experiment, with strings, and extensible data structures. Another thing I like is that I can load this in Matlab (I can provide enhanced script for hdf5, if somebody wants them), and I think it is possible to read hdf5 in Origin. I don't use these software, but some colleagues do. So I think the choices between pytables and cPickle boils down to whether you want to share the data with other software than Python or not. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Pickle, pytables, and sqlite - loading and saving recarray's
I am interesting in using sqlite (or pytables) to store data for scientific research. I wrote the attached test program to save and load a simulated 11x500,000 recarray. Average save and load times are given below (timeit with 20 repetitions). The save time for sqlite is not really fair because I have to delete the data table each time before I create the new one. It is still pretty slow in comparison. Loading the recarray from sqlite is significantly slower than pytables or cPickle. I am hoping there may be more efficient ways to save and load recarray¹s from/to sqlite than what I am now doing. Note that I infer the variable names and types from the data rather than specifying them manually. I¹d luv to hear from people using sqlite, pytables, and cPickle about their experiences. saving recarray with cPickle: 1.448568 sec/pass saving recarray with pytable: 3.437228 sec/pass saving recarray with sqlite: 193.286204 sec/pass loading recarray using cPickle:0.471365 sec/pass loading recarray with pytable: 0.692838 sec/pass loading recarray with sqlite:15.977018 sec/pass Best, Vincent load_tables_test.py Description: Binary data ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion