On Fri, Jan 06, 2012 at 06:36:33PM +0200, Vlad Niculae wrote:
> [Parallel(n_jobs=4)]: Done   1 out of  10  |elapsed:    0.2s remaining:    
> 1.4s

> I think at least "Done job x of y" should be printed, I don't see why it 
> should be more difficult in the no-multiprocessing case.

:). It's funny, I've been working today on a pull request in joblib that
deals with this part of the code.

The reason is simple: what is given to the parallel processing code is an
iterator, and not a list. In other word each item is expended as we go.
Thus when you are doing something like this:

Parallel(n_job=1)(delayed(process)(X[fold], y[fold]) for fold in folds)

There is not a list of temporaries with the data in each fold that is
created. This is fairly important for memory reasons.

The Parallel object could of course transform the iterator that it is
given in a list to be able to measure its length. But that defeats the
purpose.

In the multi-processing context, this is a bit different, as the Parallel
object is dispatching folds to different processes. Thus it is consuming
the iterator. By default, it greedily dispatches everything, and thus
consumes all the iterator, and knows its length. The reason that it is
down greedily is that there is a delay in the dispatch, thus it enable to
fill in the queue fast.

There is an option (heavily used in the scikit) called 'pre_dispatch'
that enables the dispatching to be on the fly, as the queue empties. The
reason being that the greedy strategy will blow the memory if there are
many folds. In this case, the display is no longer as pleasant. You can
see such a display when using the GridSearch with many folds.

I hope this answers your question :)

Gael

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to