Le 26/09/2018 à 21:59, Joel Nothman a écrit :
> And for those interested in what's in the pipeline, we are trying to
> draft a
> roadmap... 
> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
Hello,

First of all thanks for the incredible work on scikit-learn.

I found the RoadMap quite cool and in line with some of my own concerns.
In particular :

  * "Make it easier for external users to write Scikit-learn-compatible
    components" - really a great goal to have a stable ecosystem
  * "Passing around information that is not (X, y)" - faced it.
  * "Better interface for interactive development" (wow - very feature -
    such cool - how many great !)
  * Improved tracking of fitting (cool for early stopping while doing
    hyper parameter search, or simply testing some model in a notebook)

However, here are some aspect that I, modestly, would like to see (also
maybe for some of them there is work in progress or external lib, let me
know):

  * chunk processing (kind of handling streaming data) :  when dealing
    with lot of data, the ability to fit_partial, then use transform on
    chunks of data is of good help. But it's not well exposed in current
    doc and API, and a lot of models do not support it, while they
    could. Also pipeline does not support fit_partial and there is not
    fit_transform_partial.
  * while handling "Passing around information that is not (X, y)", is
    there any plan to have transform being able to transform X and y ?
    This would ease lots of problems like subsampling, resampling or
    masking data when too incomplete. In my case for example, while
    transforming words to vectors, I may end with sentences full of out
    of vocabulary words, hence some sample I would like to let aside,
    but can't because I do not have hands on y. (and introducing it,
    make me loose my ability to use my precious pipeline). I think
    Python offers possibilities to handle the API change (for example we
    can have a new transform_xy method, and a compatibility transform
    using it until deprecation)

Also I understand that changing the API is always a big deal. But I
think scikit-learn, because of its API has played a good role in
standardizing the python ML ecosystem and this is a key contribution.
Not dealing with mature new needs and some of actual API initial flaws,
may deserve whole community as new independent and inconsistent API will
flourish as no project has the legitimity of scikit-learn. So courage :-)

Also having good integrations to popular framework like keras or gensim,
would be great (but the goal of third party packages of course).

Of course writing all this, I don't want to sonud pedantic. I know I'm
not so experimented with scikit-learn (nor did contribute to it), so
take for what it is.

Have a good day !

Alex

-- 
Alexandre Garel
tel : +33 7 68 52 69 07 / +213 656 11 85 10
skype: alexgarel / ring: ba0435e11af36e32e9b4eb13c19c52fd75c7b4b0

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to