A Wednesday 01 July 2009 15:04:08 Francesc Alted escrigué:
> However, you can still speed-up out-of-core computations by using the
> recently introduced tables.Expr class (PyTables 2.2b1, see [2]), which uses
> a combination of the Numexpr [3] and PyTables advanced computing
> capabilities:
>
> f = tb.openFile(filename+".h5", "r+")
> data = f.root.data
> expr = tb.Expr("where(data<imin, imin, data)")
> expr.setOutput(data)
> expr.eval()
> expr = tb.Expr("where(data>imax, imax, data)")
> expr.setOutput(data)
> expr.eval()
> f.close()
>
> and the timings for this venue are:
>
> Using tables.Expr
> Time creating data file: 2.393
> Time processing data file: 18.25
>
> which is around a 75% faster than a pure memmap/PyTables approach.
Ops, I suddenly realized that the above can be further accelerated by
combining both expressions into a nested one. Something like:
f = tb.openFile(filename+".h5", "r+")
data = f.root.data
# Complex expression that spawns several lines follows
expr = tb.Expr("""
where(
where(data<imin, imin, data)>imax,
imax, data)
""")
expr.setOutput(data)
expr.eval()
f.close()
With this change, the computation time is now:
Using tables.Expr
Time creating data file: 2.18
Time processing data file: 10.992
which represents another 65% of improvement over the version using two
expressions (and 3x faster than the numpy.memmap version).
> Further, if your data is compressible, you can probably achieve additional
> speed-ups by using a fast compressor (like LZO, which is supported by
> PyTables right out-of-the-box).
As I was curious, I've tried activating the LZO compressor. Here are the
results:
Using tables.Expr
Time creating data file: 3.123
Time processing data file: 12.533
Mmh, contrarily to my expectations, this hasn't accelerated the computations.
My guess is that data being very simple and synthetic, the compression ratio
is very high (200x), forcing the compressor/uncompressor to do a lot of work
here. However, with real-life data the speed could effectively improve.
OTOH, using a faster compressor could be very advantageous here too :)
Cheers,
--
Francesc Alted
import sys
from time import time
import numpy as np
import tables as tb
shape=(512, 981, 981)
filename = "/scratch2/faltet/data-vol"
# Choose the filter that you prefer
#filters = None # No compression
#filters = tb.Filters(complib="gzip", complevel=1, shuffle=True)
#filters = tb.Filters(complib="lzo", complevel=1, shuffle=True)
filters = tb.Filters(complib="lzo", complevel=1, shuffle=False)
def create_data(kind):
if kind == "np":
data = np.memmap(filename+".bin", mode='w+', dtype='f4', shape=shape)
for nrow in range(len(data)):
data[nrow] = nrow
else:
f = tb.openFile(filename+".h5", "w")
data = f.createCArray(f.root, 'data', tb.Float32Atom(), shape=shape,
filters=filters)
for nrow in range(len(data)):
data[nrow] = nrow
f.close()
def process_data(kind):
imin, imax = -2, 2
if kind == "np":
data = np.memmap(filename+".bin", mode='r+', dtype='f4', shape=shape)
for sl in data:
sl[sl<imin] = imin
sl[sl>imax] = imax
#print "data (numpy)-->", data
elif kind == "tb":
f = tb.openFile(filename+".h5", "r+")
data = f.root.data
for nrow, sl in enumerate(data):
sl[sl<imin] = imin
sl[sl>imax] = imax
data[nrow] = sl
#print "data (tables) -->", data[:]
f.close()
else:
f = tb.openFile(filename+".h5", "r+")
data = f.root.data
# Complex expression that spawns several lines follows
expr = tb.Expr("""
where(
where(data<imin, imin, data)>imax,
imax, data)
""")
expr.setOutput(data)
expr.eval()
#print "data (tables.Expr)-->", data[:]
f.close()
if __name__ == '__main__':
if len(sys.argv) > 1:
kind = sys.argv[1]
else:
kind = "np"
if kind == "np":
print "Using numpy.memmap"
elif kind == "tb":
print "Using tables"
else:
print "Using tables.Expr"
t0 = time()
create_data(kind)
print "Time creating data file:", round(time()-t0, 3)
t0 = time()
process_data(kind)
print "Time processing data file:", round(time()-t0, 3)
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion