Salut Alain,

[I'm CC'ing to the pytables-users list, just in case this is useful for 
others]

----------  Missatge transmès  ----------
> Subject: RE: Difficulty to change a value in existing Earray
> Date: Dilluns 12 Febrer 2007 12:42
> From: "Alain Fagot" <[EMAIL PROTECTED]>
> To: "'Francesc Altet'" <[EMAIL PROTECTED]>
> Hi Francesc,
>
> I investigated this problem of performance.
> In fact was due to the fact that I had to convert the content of the earray
> to a list for internal purpose.
>
> In case of a shape (1,0), I converted
>       members=list(hdfarray[0])
> hdfarray[0] being numpy.ndarray
>
> In case of a shape (0,), I converted
>       members=list(hdfarray)
> hdfarray being tables.Earray.Earray
>
> For an earray with effective side of 1000000 values:
>  - creation size of earray is similar: 1 second
>  - conversion to list takes 1 second for shape (1,0) and 100 seconds for
> shape (0,)
>
> I attached the two little benchmark programs

You are having two problems here that prevents you from getting good 
performance:

1.- The list(hdfarray) operation, when hdfarray is an EArray, makes that a 
EArray.__getitem__() call would be issued for *each* element of the EArray 
(i.e. will do one million calls), making this operation very slow.

You can accelerate this by issuing list(hdfarray[:]) instead. Here, the [:] 
indexing operation is retrieving *all* the rows in hdfarray in just one 
single call, so it's more efficient. With this, your benchmark takes:

Ellapse Time to retrieve list 7.24 sec

instead of more than 300 sec that took your first approach.

2.- You can always achieve better performance if you inform PyTables about the 
expected size of your datasets. So, for example, if you create the Earray 
with:

hdfarray = fileh.createEArray(fileh.root, 'array_float', a, "Floats",
                              expectedrows=MAX_DIM_ARRAY)
[note the expectedrows parameter]

then, the times decrease up to  0.67 sec which is 10x better than the figure 
that we get without informing about the size.

So, to summarize, the basic recipes for achieving high performance in datasets 
reads are:

- Reduce as much as you can the calls for retrieve data
- Always inform PyTables objects about the expected size they can reach

HTH,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
import tables
from numarray import *
import numpy
from time import *

MAX_DIM_ARRAY = 1000000
start_time = clock()
print "EARRAY shape=(0,)"

fileh = tables.openFile("test_earray.h5", mode = "w")
a = tables.Float64Atom(shape=(0,), flavor='numpy')
hdfarray = fileh.createEArray(fileh.root, 'array_float', a, "Floats")

vals = list(numpy.ones(MAX_DIM_ARRAY))
hdfarray.append(array(vals,type=Float64, shape=(len(vals))))
print "Ellapse Time to create earray",clock() - start_time 
start_time = clock()

#copy content to list
print "object converted to list is:", hdfarray.__class__
members=list(hdfarray)
print "Ellapse Time to retrieve list",clock() - start_time 

# Close the file
fileh.close()
import tables
from numarray import *
import numpy
from time import *

MAX_DIM_ARRAY = 1000000
start_time = clock()
print "EARRAY shape=(1,0)"

fileh = tables.openFile("test_earray.h5", mode = "w")
a = tables.Float64Atom(shape=(1,0), flavor='numpy')
hdfarray = fileh.createEArray(fileh.root, 'array_float', a, "Floats")

vals = list(numpy.ones(MAX_DIM_ARRAY))
hdfarray.append(array(vals,type=Float64, shape=(1,len(vals))))
print "Ellapse Time to create earray",clock() - start_time 
start_time = clock()

#copy content to list
print "object converted to list is:", hdfarray[0].__class__
members=list(hdfarray[0])
print "Ellapse Time to retrieve list",clock() - start_time 
# Close the file
fileh.close()
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to