On 12/04/14 00:39, Nathaniel Smith wrote: > The spawn mode is fine and all, but (a) the presence of something in > 3.4 helps only a minority of users, (b) "spawn" is not a full > replacement for fork;
It basically does the same as on Windows. If you want portability to Windows, you must abide by these restrictions anyway. > with large read-mostly data sets it can be a > *huge* win to load them into the parent process and then let them be > COW-inherited by forked children. The thing is that Python reference counts breaks COW fork. This has been discussed several times on the Python-dev list. What happens is that as soon as the child process updates a refcount, the OS copies the page. And because of how Python behaves, this copying of COW-marked pages quickly gets excessive. Effectively the performance of os.fork in Python will close to a non-COW fork. A suggested solution is to move the refcount out of the PyObject struct, and perhaps keep them in a dedicated heap. But doing so will be unfriendly to cache. > ATM the only other way to work with > a data set that's larger than memory-divided-by-numcpus is to > explicitly set up shared memory, and this is *really* hard for > anything more complicated than a single flat array. Not difficult. You just go to my GitHub site and grab the code ;) (I have some problems running it on my MBP though, not sure why, but it used to work on Linux and Windows, and possibly still does.) https://github.com/sturlamolden/sharedmem-numpy Sturla _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion