Francesc,
thank you very much for your speedy and well explained response!
I modified the mock-up script I sent originally according to your guidelines
(lock, open, save and close for each workers) and it seems to be working fine.
I hope to translate the solution to my real problem successfully as well.
Best,
Marko
On Nov 4, 2010, at 10:03 AM, Francesc Alted wrote:
> A Wednesday 03 November 2010 23:59:38 Marko Budisic escrigué:
>> Dear all,
>>
>> I am having some trouble with using pytables correctly, and I was
>> hoping for some guidance. I would like to have one central pytables
>> file, containing a VLArray that would be used by several "worker"
>> processes. Each process should perform some computation, and append
>> it as a new row to VLArray. Due to possible sizes of results, it
>> would be difficult to pass results to the main thread for it to
>> store into pytables file.
> [clip]
>
> What you are trying to achieve is tricky, but fortunately, possible.
> First, in order to avoid problems with internal caches, you need to
> lock, open, save and close for *each* worker. You are not doing this
> currently.
>
> Then, you need to respect the "lock, open, save and close" order if you
> want to ensure that everything goes well. This example should
> illustrate the proper sequence:
>
> #!/usr/bin/env python
>
> from multiprocessing import Pool
> import fcntl
> import numpy
> import tables
> import os
>
> def work(i):
> x = numpy.random.random((6,5000))
> group = '/group%d/group%d' % (i, i)
> dataset = 'dataset%d' % i
> fhandle = os.open('/tmp/output.h5', os.O_RDWR)
> fcntl.lockf(fhandle, fcntl.LOCK_EX)
> f = tables.openFile('/tmp/output.h5','a')
> # moving lockf here instead will cause crashes!
> arr = f.createArray(group, dataset, x, createparents=True)
> f.close()
> os.close(fhandle)
>
> def main():
> tables.openFile('/tmp/output.h5','w').close()
> pool = Pool(processes=8)
> pool.map(work, range(5000), chunksize=1)
>
> if __name__ == '__main__':
> main()
>
> [please note the use of lockf over an opened filehandle]
>
> Third, you will need at least PyTables 2.2 in order the above to work.
>
> You can get more info on this in:
>
> http://pytables.org/trac/ticket/185
>
> Hope this helps,
>
> --
> Francesc Alted
>
> ------------------------------------------------------------------------------
> The Next 800 Companies to Lead America's Growth: New Video Whitepaper
> David G. Thomson, author of the best-selling book "Blueprint to a
> Billion" shares his insights and actions to help propel your
> business during the next growth cycle. Listen Now!
> http://p.sf.net/sfu/SAP-dev2dev
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users
------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a
Billion" shares his insights and actions to help propel your
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users