Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

Andreas Mueller Tue, 02 Oct 2018 08:48:02 -0700

Thank you for your feedback Alex!

On 10/02/2018 09:28 AM, Alex Garel wrote:


  * chunk processing (kind of handling streaming data) :  when dealing
    with lot of data, the ability to fit_partial, then use transform
    on chunks of data is of good help. But it's not well exposed in
    current doc and API,

This has been discussed in the past, but it looks like no-one wasexcited enough about it to add it to the roadmap.This would require quite some additions to the API. Olivier, who hasbeen quite interested in this before now seemsto be more interested in integration with dask, which might achieve thesame thing.


  * and a lot of models do not support it, while they could.

Can you give examples of that?


  * Also pipeline does not support fit_partial and there is not
    fit_transform_partial.

What would you expect those to do? Each step in the pipeline mightrequire passing over the whole dataset multiple timesbefore being able to transform anything. That basically makes thecurrent interface impossible to work with the pipeline.Even if only a single pass of the dataset was required, that wouldn'twork with the current interface.If we would be handing around generators that allow to loop over thewhole data, that would work. But it would be unclear

how to support a streaming setting.

  * while handling "Passing around information that is not (X, y)", is
    there any plan to have transform being able to transform X and y ?
    This would ease lots of problems like subsampling, resampling or
    masking data when too incomplete.

An API for subsampling is on the roadmap :)

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

Reply via email to