Hi Andrea,

Your problem is two fold.

1. Your timing wasn't reporting the time per data set, but rather the total
time since writing all data sets.  You need to put the start time in the
loop to get the time per data set.

2. Your larger problem was that you were writing too many times.  Generally
it is faster to write fewer, bigger sets of data than performing a lot of
small write operations.  Since you had data set opening and writing in a
doubly nested loop, it is not surprising that you were getting
terrible performance.   You were basically maximizing HDF5 overhead ;).
 Using slicing I removed the outermost loop and saw timings like the
following:

H5 file creation time: 7.406

Saving results for table: 0.0105440616608
Saving results for table: 0.0158948898315
Saving results for table: 0.0164661407471
Saving results for table: 0.00654292106628
Saving results for table: 0.00676298141479
Saving results for table: 0.00664114952087
Saving results for table: 0.0066990852356
Saving results for table: 0.00687289237976
Saving results for table: 0.00664210319519
Saving results for table: 0.0157809257507
Saving results for table: 0.0141618251801
Saving results for table: 0.00796294212341

Please see the attached version, at around line 82.  Additionally, if you
need to focus on performance I would recommend reading the following (
http://pytables.github.com/usersguide/optimization.html).  PyTables can
be blazingly fast when implemented correctly.  I would highly recommend
looking into compression.

I hope this helps!
Be Well
Anthony

On Tue, Oct 30, 2012 at 4:55 PM, Andrea Gavana <andrea.gav...@gmail.com>wrote:

> Hi All,
>
>     I am pretty new to pytables and I am facing a problem of actually
> storing and retrieving data to/from a large dataset. My situation is
> the following:
>
> 1. I am running stochastic simulations of a number of objects
> (typically between 100-1,000 simulations);
> 2. For every simulation, I have around 1,200 "objects", and for each
> of them I have 7 timeseries of 600 time-steps each.
>
> I thought of using pytables to try and get some sense out of my
> simulations, but I am failing to implement something intelligent (or
> fast, which is important as well...).
>
> The attached script (modified from the pytables tutorial) does the
> following:
>
> 1. Create a table containing these "objects";
> 2. Adds 1,200 rows, one per "object": for each "object", I assign a 3D
> array defined as:
>
> results = Float32Col(shape=(NUM_SIM, len(ALL_DATES), 7))
>
> Where NUM_SIM is the number of simulations and ALL_DATES are the timesteps.
>
> 3. For every simulation, I update the "object" results (using random
> numbers in the script).
>
> The timings on my computer are as follows (in seconds):
>
> H5 file creation time: 22.510
>
> Saving results for simulation 1   : 3.33599996567
> Saving results for simulation 2   : 6.2429997921
> Saving results for simulation 3   : 9.15199995041
> Saving results for simulation 4   : 12.0759999752
> Saving results for simulation 5   : 15.2199997902
> Saving results for simulation 6   : 17.9159998894
> Saving results for simulation 7   : 21.0659999847
> Saving results for simulation 8   : 23.6459999084
> Saving results for simulation 9   : 26.5359997749
> Saving results for simulation 10  : 29.5579998493
>
> As you can see, at every simulation the processing time increases by 3
> seconds, so by the time I get to 100 or 1,000 I will have more than
> enough time for 15 coffees in the morning :-D
> Also, the file creation time is somewhat on the slow side...
>
> I am sure I am missing a lot of things here, so I would appreciate any
> suggestion to implement my code in a better/more intelligent way (and
> also suggestions on other approaches in order to do what I am trying
> to do).
>
> Thank you in advance for your suggestions.
>
> Andrea.
>
> "Imagination Is The Only Weapon In The War Against Reality."
> http://www.infinity77.net
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Attachment: pytables_test.py
Description: Binary data

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to