2012/9/22 Christian Jauvin <[email protected]>: > Hi, > > I have been doing multiple experiments using a RandomForestClassifier > (trained with the parallel code option) recently, without encountering > any particular problem. However as soon as I began using a much bigger > dataset (with the exact same code), I got this threading error: > > Exception in thread Thread-2: > Traceback (most recent call last): > File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner > self.run() > File "/usr/lib/python2.7/threading.py", line 504, in run > self.__target(*self.__args, **self.__kwargs) > File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in > _handle_tasks > put(task) > SystemError: NULL result without error in PyObject_Call > > I can provide additional details of course, but first maybe there is > something in particular I should be aware of, about size or memory > limit of the underlying objects in question? >
It can be a memory error as the current implementation is very bad at managing the memory. You can try to replace the joblib folder in the sklearn source tree by the "pickling-pool" branch of my repo: https://github.com/joblib/joblib/pull/44 That should help a lot. You can further memmap your original dataset has explained in the following doc to get even better memory usage reduction: https://github.com/ogrisel/joblib/blob/pickling-pool/doc/parallel_numpy.rst You might also want to set the TMP environment variable to a folder on a big partition. I am very interested in any feedback while using this branch. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;13503038;z? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
