Hello,
I wanted to use PyTables in conjunction with multiprocessing for some
embarrassingly parallel tasks.
However, it seems that it is not possible. In the following (very
stupid) example, X is a Carray of size (100, 10) stored in the file
test.hdf5:
import tables
import multiprocessing
# Reload the data
h5file = tables.openFile('test.hdf5', mode='r')
X = h5file.root.X
# Use multiprocessing to perform a simple computation (column average)
def f(X):
name = multiprocessing.current_process().name
column = random.randint(0, n_features)
print '%s use column %i' % (name, column)
return X[:, column].mean()
p = multiprocessing.Pool(2)
col_mean = p.map(f, [X, X, X])
When executing it the following error:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'weakref'>: attribute lookup
__builtin__.weakref failed
I have googled for weakref and pickle but can't find a solution.
Any help?
By the way, I have noticed that by slicing a Carray, I get a numpy array
(I created the HDF5 file with numpy). Therefore, everything is copied to
memory. Is there a way to avoid that?
Mathieu
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users