Le 26/09/2018 à 21:59, Joel Nothman a écrit : > And for those interested in what's in the pipeline, we are trying to > draft a > roadmap... > https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018 Hello,
First of all thanks for the incredible work on scikit-learn. I found the RoadMap quite cool and in line with some of my own concerns. In particular : * "Make it easier for external users to write Scikit-learn-compatible components" - really a great goal to have a stable ecosystem * "Passing around information that is not (X, y)" - faced it. * "Better interface for interactive development" (wow - very feature - such cool - how many great !) * Improved tracking of fitting (cool for early stopping while doing hyper parameter search, or simply testing some model in a notebook) However, here are some aspect that I, modestly, would like to see (also maybe for some of them there is work in progress or external lib, let me know): * chunk processing (kind of handling streaming data) : when dealing with lot of data, the ability to fit_partial, then use transform on chunks of data is of good help. But it's not well exposed in current doc and API, and a lot of models do not support it, while they could. Also pipeline does not support fit_partial and there is not fit_transform_partial. * while handling "Passing around information that is not (X, y)", is there any plan to have transform being able to transform X and y ? This would ease lots of problems like subsampling, resampling or masking data when too incomplete. In my case for example, while transforming words to vectors, I may end with sentences full of out of vocabulary words, hence some sample I would like to let aside, but can't because I do not have hands on y. (and introducing it, make me loose my ability to use my precious pipeline). I think Python offers possibilities to handle the API change (for example we can have a new transform_xy method, and a compatibility transform using it until deprecation) Also I understand that changing the API is always a big deal. But I think scikit-learn, because of its API has played a good role in standardizing the python ML ecosystem and this is a key contribution. Not dealing with mature new needs and some of actual API initial flaws, may deserve whole community as new independent and inconsistent API will flourish as no project has the legitimity of scikit-learn. So courage :-) Also having good integrations to popular framework like keras or gensim, would be great (but the goal of third party packages of course). Of course writing all this, I don't want to sonud pedantic. I know I'm not so experimented with scikit-learn (nor did contribute to it), so take for what it is. Have a good day ! Alex -- Alexandre Garel tel : +33 7 68 52 69 07 / +213 656 11 85 10 skype: alexgarel / ring: ba0435e11af36e32e9b4eb13c19c52fd75c7b4b0
signature.asc
Description: OpenPGP digital signature
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn