Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-04 Thread Andreas Mueller



On 10/03/2018 03:32 PM, Nick Pentreath wrote:
For ONNX you may be interested in 
https://github.com/onnx/onnxmltools - which supports conversion of a 
few skelarn models to ONNX already.


However as far as I am aware, none of the ONNX backends actually 
support the ONNX-ML extended spec (in open-source at least). So you 
would not be able to actually do prediction I think...

Exactly, that's what I'm waiting for. MS is working on itafaik.



As for PFA, to my current knowledge there is no library that does it 
yet. Our own Aardpfark project 
(https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFA 
for now but would like to add sklearn support in the future.



On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka 
mailto:m...@sebastianraschka.com>> wrote:


The ONNX-approach sounds most promising, esp. because it will also
allow library interoperability but I wonder if this is for
parametric models only and not for the nonparametric ones like
KNN, tree-based classifiers, etc.

All-in-all I can definitely see the appeal for having a way to
export sklearn estimators in a text-based format (e.g., via JSON),
since it would make sharing code easier. This doesn't even have to
be compatible with multiple sklearn versions. A typical use case
would be to include these JSON exports as e.g., supplemental files
of a research paper for other people to run the models etc. (here,
one can just specify which sklearn version it would require; of
course, one could also share pickle files, by I am personally
always hesitant reg. running/trusting other people's pickle files).

Unfortunately though, as Gael pointed out, this "feature" would be
a huge burden for the devs, and it would probably also negatively
impact the development of scikit-learn itself because it imposes
another design constraint.

However, I do think this sounds like an excellent case for a
contrib project. Like scikit-export, scikit-serialize or sth like
that.

Best,
Sebastian



> On Oct 3, 2018, at 5:49 AM, Javier López  wrote:
>
>
> On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux
mailto:gael.varoqu...@normalesup.org>> wrote:
> The reason that pickles are brittle and that sharing pickles is
a bad
> practice is that pickle use an implicitly defined data model,
which is
> defined via the internals of objects.
>
> Plus the fact that loading a pickle can execute arbitrary code,
and there is no way to know
> if any malicious code is in there in advance because the
contents of the pickle cannot
> be easily inspected without loading/executing it.
>
> So, the problems of pickle are not specific to pickle, but rather
> intrinsic to any generic persistence code [*]. Writing
persistence code that
> does not fall in these problems is very costly in terms of
developer time
> and makes it harder to add new methods or improve existing one.
I am not
> excited about it.
>
> My "text-based serialization" suggestion was nowhere near as
ambitious as that,
> as I have already explained, and wasn't aiming at solving the
versioning issues, but
> rather at having something which is "about as good" as pickle
but in a human-readable
> format. I am not asking for a Turing-complete language to
reproduce the prediction
> function, but rather something simple in the spirit of the
output produced by the gist code I linked above, just for the
model families where it is reasonable:
>
> https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
>
> The code I posted mostly works (specific cases of nested models
need to be addressed
> separately, as well as pipelines), and we have been using (a
version of) it in production
> for quite some time. But there are hackish aspects to it that we
are not happy with,
> such as the manual separation of init and fitted parameters by
checking if the name ends with "_", having to infer class name and
location using
> "model.__class__.__name__" and "model.__module__", and the wacky
use of "__import__".
>
> My suggestion was more along the lines of adding some metadata
to sklearn estimators so
> that a code in a similar style would be nicer to write; little
things like having a `init_parameters` and `fit_parameters`
properties that would return the lists of named parameters,
> or a `model_info` method that would return data like sklearn
version, class name and location, or a package level dictionary
pointing at the estimator classes by a string name, like
>
> from sklearn.linear_models import LogisticRegression
> estimator_classes = {"LogisticRegression": LogisticRegression, ...}
>
> so that one can load the appropriate class from the string
description without calling __import__ or eval; tha

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-03 Thread Nick Pentreath
For ONNX you may be interested in https://github.com/onnx/onnxmltools -
which supports conversion of a few skelarn models to ONNX already.

However as far as I am aware, none of the ONNX backends actually support
the ONNX-ML extended spec (in open-source at least). So you would not be
able to actually do prediction I think...

As for PFA, to my current knowledge there is no library that does it yet.
Our own Aardpfark project (https://github.com/CODAIT/aardpfark) focuses on
SparkML export to PFA for now but would like to add sklearn support in the
future.


On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka 
wrote:

> The ONNX-approach sounds most promising, esp. because it will also allow
> library interoperability but I wonder if this is for parametric models only
> and not for the nonparametric ones like KNN, tree-based classifiers, etc.
>
> All-in-all I can definitely see the appeal for having a way to export
> sklearn estimators in a text-based format (e.g., via JSON), since it would
> make sharing code easier. This doesn't even have to be compatible with
> multiple sklearn versions. A typical use case would be to include these
> JSON exports as e.g., supplemental files of a research paper for other
> people to run the models etc. (here, one can just specify which sklearn
> version it would require; of course, one could also share pickle files, by
> I am personally always hesitant reg. running/trusting other people's pickle
> files).
>
> Unfortunately though, as Gael pointed out, this "feature" would be a huge
> burden for the devs, and it would probably also negatively impact the
> development of scikit-learn itself because it imposes another design
> constraint.
>
> However, I do think this sounds like an excellent case for a contrib
> project. Like scikit-export, scikit-serialize or sth like that.
>
> Best,
> Sebastian
>
>
>
> > On Oct 3, 2018, at 5:49 AM, Javier López  wrote:
> >
> >
> > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux <
> gael.varoqu...@normalesup.org> wrote:
> > The reason that pickles are brittle and that sharing pickles is a bad
> > practice is that pickle use an implicitly defined data model, which is
> > defined via the internals of objects.
> >
> > Plus the fact that loading a pickle can execute arbitrary code, and
> there is no way to know
> > if any malicious code is in there in advance because the contents of the
> pickle cannot
> > be easily inspected without loading/executing it.
> >
> > So, the problems of pickle are not specific to pickle, but rather
> > intrinsic to any generic persistence code [*]. Writing persistence code
> that
> > does not fall in these problems is very costly in terms of developer time
> > and makes it harder to add new methods or improve existing one. I am not
> > excited about it.
> >
> > My "text-based serialization" suggestion was nowhere near as ambitious
> as that,
> > as I have already explained, and wasn't aiming at solving the versioning
> issues, but
> > rather at having something which is "about as good" as pickle but in a
> human-readable
> > format. I am not asking for a Turing-complete language to reproduce the
> prediction
> > function, but rather something simple in the spirit of the output
> produced by the gist code I linked above, just for the model families where
> it is reasonable:
> >
> > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
> >
> > The code I posted mostly works (specific cases of nested models need to
> be addressed
> > separately, as well as pipelines), and we have been using (a version of)
> it in production
> > for quite some time. But there are hackish aspects to it that we are not
> happy with,
> > such as the manual separation of init and fitted parameters by checking
> if the name ends with "_", having to infer class name and location using
> > "model.__class__.__name__" and "model.__module__", and the wacky use of
> "__import__".
> >
> > My suggestion was more along the lines of adding some metadata to
> sklearn estimators so
> > that a code in a similar style would be nicer to write; little things
> like having a `init_parameters` and `fit_parameters` properties that would
> return the lists of named parameters,
> > or a `model_info` method that would return data like sklearn version,
> class name and location, or a package level dictionary pointing at the
> estimator classes by a string name, like
> >
> > from sklearn.linear_models import LogisticRegression
> > estimator_classes = {"LogisticRegression": LogisticRegression, ...}
> >
> > so that one can load the appropriate class from the string description
> without calling __import__ or eval; that sort of stuff.
> >
> > I am aware this would not address the common complain of "prefect
> prediction reproducibility"
> > across versions, but I think we can all agree that this utopia of
> perfect reproducibility is not
> > feasible.
> >
> > And in the long, long run, I agree that PFA/onnx or whichever similar
> format that emerges, is
> > the 

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-03 Thread Sebastian Raschka
The ONNX-approach sounds most promising, esp. because it will also allow 
library interoperability but I wonder if this is for parametric models only and 
not for the nonparametric ones like KNN, tree-based classifiers, etc.

All-in-all I can definitely see the appeal for having a way to export sklearn 
estimators in a text-based format (e.g., via JSON), since it would make sharing 
code easier. This doesn't even have to be compatible with multiple sklearn 
versions. A typical use case would be to include these JSON exports as e.g., 
supplemental files of a research paper for other people to run the models etc. 
(here, one can just specify which sklearn version it would require; of course, 
one could also share pickle files, by I am personally always hesitant reg. 
running/trusting other people's pickle files).

Unfortunately though, as Gael pointed out, this "feature" would be a huge 
burden for the devs, and it would probably also negatively impact the 
development of scikit-learn itself because it imposes another design constraint.

However, I do think this sounds like an excellent case for a contrib project. 
Like scikit-export, scikit-serialize or sth like that.

Best,
Sebastian



> On Oct 3, 2018, at 5:49 AM, Javier López  wrote:
> 
> 
> On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux  
> wrote:
> The reason that pickles are brittle and that sharing pickles is a bad
> practice is that pickle use an implicitly defined data model, which is
> defined via the internals of objects.
> 
> Plus the fact that loading a pickle can execute arbitrary code, and there is 
> no way to know
> if any malicious code is in there in advance because the contents of the 
> pickle cannot
> be easily inspected without loading/executing it.
>  
> So, the problems of pickle are not specific to pickle, but rather
> intrinsic to any generic persistence code [*]. Writing persistence code that
> does not fall in these problems is very costly in terms of developer time
> and makes it harder to add new methods or improve existing one. I am not
> excited about it.
> 
> My "text-based serialization" suggestion was nowhere near as ambitious as 
> that,
> as I have already explained, and wasn't aiming at solving the versioning 
> issues, but
> rather at having something which is "about as good" as pickle but in a 
> human-readable
> format. I am not asking for a Turing-complete language to reproduce the 
> prediction
> function, but rather something simple in the spirit of the output produced by 
> the gist code I linked above, just for the model families where it is 
> reasonable:
> 
> https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
> 
> The code I posted mostly works (specific cases of nested models need to be 
> addressed 
> separately, as well as pipelines), and we have been using (a version of) it 
> in production
> for quite some time. But there are hackish aspects to it that we are not 
> happy with,
> such as the manual separation of init and fitted parameters by checking if 
> the name ends with "_", having to infer class name and location using 
> "model.__class__.__name__" and "model.__module__", and the wacky use of 
> "__import__".
> 
> My suggestion was more along the lines of adding some metadata to sklearn 
> estimators so
> that a code in a similar style would be nicer to write; little things like 
> having a `init_parameters` and `fit_parameters` properties that would return 
> the lists of named parameters, 
> or a `model_info` method that would return data like sklearn version, class 
> name and location, or a package level dictionary pointing at the estimator 
> classes by a string name, like
> 
> from sklearn.linear_models import LogisticRegression
> estimator_classes = {"LogisticRegression": LogisticRegression, ...}
> 
> so that one can load the appropriate class from the string description 
> without calling __import__ or eval; that sort of stuff.
> 
> I am aware this would not address the common complain of "prefect prediction 
> reproducibility"
> across versions, but I think we can all agree that this utopia of perfect 
> reproducibility is not 
> feasible.
> 
> And in the long, long run, I agree that PFA/onnx or whichever similar format 
> that emerges, is
> the way to go.
> 
> J
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-03 Thread Javier López
On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux 
wrote:

> The reason that pickles are brittle and that sharing pickles is a bad
> practice is that pickle use an implicitly defined data model, which is
> defined via the internals of objects.
>

Plus the fact that loading a pickle can execute arbitrary code, and there
is no way to know
if any malicious code is in there in advance because the contents of the
pickle cannot
be easily inspected without loading/executing it.


> So, the problems of pickle are not specific to pickle, but rather
> intrinsic to any generic persistence code [*]. Writing persistence code
> that
> does not fall in these problems is very costly in terms of developer time
> and makes it harder to add new methods or improve existing one. I am not
> excited about it.
>

My "text-based serialization" suggestion was nowhere near as ambitious as
that,
as I have already explained, and wasn't aiming at solving the versioning
issues, but
rather at having something which is "about as good" as pickle but in a
human-readable
format. I am not asking for a Turing-complete language to reproduce the
prediction
function, but rather something simple in the spirit of the output produced
by the gist code I linked above, just for the model families where it is
reasonable:

https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31

The code I posted mostly works (specific cases of nested models need to be
addressed
separately, as well as pipelines), and we have been using (a version of) it
in production
for quite some time. But there are hackish aspects to it that we are not
happy with,
such as the manual separation of init and fitted parameters by checking if
the name ends with "_", having to infer class name and location using
"model.__class__.__name__" and "model.__module__", and the wacky use of
"__import__".

My suggestion was more along the lines of adding some metadata to sklearn
estimators so
that a code in a similar style would be nicer to write; little things like
having a `init_parameters` and `fit_parameters` properties that would
return the lists of named parameters,
or a `model_info` method that would return data like sklearn version, class
name and location, or a package level dictionary pointing at the estimator
classes by a string name, like

from sklearn.linear_models import LogisticRegression
estimator_classes = {"LogisticRegression": LogisticRegression, ...}

so that one can load the appropriate class from the string description
without calling __import__ or eval; that sort of stuff.

I am aware this would not address the common complain of "prefect
prediction reproducibility"
across versions, but I think we can all agree that this utopia of perfect
reproducibility is not
feasible.

And in the long, long run, I agree that PFA/onnx or whichever similar
format that emerges, is
the way to go.

J
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-03 Thread Alex Garel
Le 02/10/2018 à 16:46, Andreas Mueller a écrit :
> Thank you for your feedback Alex!
Thanks for answering !

>
> On 10/02/2018 09:28 AM, Alex Garel wrote:
>>
>>   * chunk processing (kind of handling streaming data) :  when
>> dealing with lot of data, the ability to fit_partial, then use
>> transform on chunks of data is of good help. But it's not well
>> exposed in current doc and API,
>>
> This has been discussed in the past, but it looks like no-one was
> excited enough about it to add it to the roadmap.
> This would require quite some additions to the API. Olivier, who has
> been quite interested in this before now seems
> to be more interested in integration with dask, which might achieve
> the same thing.

I've tried to use Dask on my side, but for now, though going quite
ahead, I didn't suceed completly because (in my specific case) of memory
issues (dask default schedulers do not specialize processes on tasks,
and I had some memory consuming tasks but I didn't get far enough to
write my own scheduler). However I might deal with that later (not
writing a scheduler but sharing memory with mmap, in this case).
But yes Dask is about the "chunk instead of really streaming" approach
(which was my point).

>>   * and a lot of models do not support it, while they could.
>>
> Can you give examples of that? 
Hum I spoke maybe too fast ! Greping the code give me some example at
least, and it's true that a DecisionTree does not hold it naturally !

>>   * Also pipeline does not support fit_partial and there is not
>> fit_transform_partial.
>>
> What would you expect those to do? Each step in the pipeline might
> require passing over the whole dataset multiple times
> before being able to transform anything. That basically makes the
> current interface impossible to work with the pipeline.
> Even if only a single pass of the dataset was required, that wouldn't
> work with the current interface.
> If we would be handing around generators that allow to loop over the
> whole data, that would work. But it would be unclear
> how to support a streaming setting.
You're right, I didn't think hard enough about it !

BTW I made some test using generators and making fit / transform build
pipelines that I consumed latter on (tried with plain iterators and
streamz).
It did work somehow, with much hacks, but in my specific case,
performance where not good enough. (real problem was not framework
performance, but my architecture where I realize, that constantly
re-generating data instead of doing it once was not fast enough).

So finally my points were not so good, but at least I did learn
something ;-)

Thanks for your time.


-- 
Alexandre Garel
tel : +33 7 68 52 69 07 / +213 656 11 85 10
skype: alexgarel / ring: ba0435e11af36e32e9b4eb13c19c52fd75c7b4b0



signature.asc
Description: OpenPGP digital signature
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-02 Thread Gael Varoquaux
On Tue, Oct 02, 2018 at 12:20:40PM -0400, Andreas Mueller wrote:
> I think having solution is to have MS, FB, Amazon, IBM, Nvidia, intel,...
> maintain our generic persistent code is a decent deal for us if it works out 
> ;)

> https://onnx.ai/

I'll take that deal! :)

+1 for onnx, absolutely!

G
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-02 Thread Andreas Mueller



On 10/02/2018 12:01 PM, Gael Varoquaux wrote:


So, the problems of pickle are not specific to pickle, but rather
intrinsic to any generic persistence code [*]. Writing persistence code that
does not fall in these problems is very costly in terms of developer time
and makes it harder to add new methods or improve existing one. I am not
excited about it.


I think having solution is to have MS, FB, Amazon, IBM, Nvidia, intel,...
maintain our generic persistent code is a decent deal for us */if/* it 
works out ;)


https://onnx.ai/

(MS is providing sklearn to ONNX converters and is extending ONNX to
allow for more sklearn estimators to be expressed in ONNX).

Containers are a reasonable fallback, though.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-02 Thread Gael Varoquaux
On Fri, Sep 28, 2018 at 09:45:16PM +0100, Javier López wrote:
> This is not the whole truth. Yes, you store the sklearn version on the pickle
> and raise a warning; I am mostly ok with that, but the pickles are brittle and
> oftentimes they stop loading when other versions of other stuff change. I am
> not talking about "Warning: wrong version", but rather "Unpickling error:
> expected bytes, found tuple" that prevent the file from loading entirely.
> [...]
> 1. Things in the current state break when something else changes, not only
> sklearn.
> 2. Sharing pickles is a bad practice due to a number of reasons.

The reason that pickles are brittle and that sharing pickles is a bad
practice is that pickle use an implicitly defined data model, which is
defined via the internals of objects.

The "right" solution is to use an explicit data model. This is for
instance what is done with an object database. However, this comes at the
cost of making it very hard to change objects. First, all objects must be
stored with a schema (or language) that is rich enough to represent it,
and yet defined somewhat explicitly (to avoid running into the problems
of pickle). Second, if the internal representation of the object change,
there needs to be explicit conversion code to go from one version to the
next. Typically, upgrade of websites that use object database need
maintainers to write this conversion code.


So, the problems of pickle are not specific to pickle, but rather
intrinsic to any generic persistence code [*]. Writing persistence code that
does not fall in these problems is very costly in terms of developer time
and makes it harder to add new methods or improve existing one. I am not
excited about it.

Rather, the good practice is that if you want to deploy model you deploy
on the exact same environment that you have trained them. The web world
is very used to doing that (because they keep falling in these problems),
and has developed technology to do this, such as docker containers. I
know that it is clunky technology. I don't like it myself, but I don't
see a way out of it with our resources.

Gaël

[*] Back in the days, when I was working on Mayavi, we developed our
persistence code, because we were not happy with pickle. It was not
pleasant to maintain, and had the same "smell" as pickle. I don't think
that it was a great use of our time.

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-02 Thread Andreas Mueller

Thank you for your feedback Alex!

On 10/02/2018 09:28 AM, Alex Garel wrote:


  * chunk processing (kind of handling streaming data) :  when dealing
with lot of data, the ability to fit_partial, then use transform
on chunks of data is of good help. But it's not well exposed in
current doc and API,

This has been discussed in the past, but it looks like no-one was 
excited enough about it to add it to the roadmap.
This would require quite some additions to the API. Olivier, who has 
been quite interested in this before now seems
to be more interested in integration with dask, which might achieve the 
same thing.


  * and a lot of models do not support it, while they could.


Can you give examples of that?


  * Also pipeline does not support fit_partial and there is not
fit_transform_partial.

What would you expect those to do? Each step in the pipeline might 
require passing over the whole dataset multiple times
before being able to transform anything. That basically makes the 
current interface impossible to work with the pipeline.
Even if only a single pass of the dataset was required, that wouldn't 
work with the current interface.
If we would be handing around generators that allow to loop over the 
whole data, that would work. But it would be unclear

how to support a streaming setting.


  * while handling "Passing around information that is not (X, y)", is
there any plan to have transform being able to transform X and y ?
This would ease lots of problems like subsampling, resampling or
masking data when too incomplete.


An API for subsampling is on the roadmap :)





___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] [ANN] Scikit-learn 0.20.0

2018-10-02 Thread Alex Garel
Le 26/09/2018 à 21:59, Joel Nothman a écrit :
> And for those interested in what's in the pipeline, we are trying to
> draft a
> roadmap... 
> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
Hello,

First of all thanks for the incredible work on scikit-learn.

I found the RoadMap quite cool and in line with some of my own concerns.
In particular :

  * "Make it easier for external users to write Scikit-learn-compatible
components" - really a great goal to have a stable ecosystem
  * "Passing around information that is not (X, y)" - faced it.
  * "Better interface for interactive development" (wow - very feature -
such cool - how many great !)
  * Improved tracking of fitting (cool for early stopping while doing
hyper parameter search, or simply testing some model in a notebook)

However, here are some aspect that I, modestly, would like to see (also
maybe for some of them there is work in progress or external lib, let me
know):

  * chunk processing (kind of handling streaming data) :  when dealing
with lot of data, the ability to fit_partial, then use transform on
chunks of data is of good help. But it's not well exposed in current
doc and API, and a lot of models do not support it, while they
could. Also pipeline does not support fit_partial and there is not
fit_transform_partial.
  * while handling "Passing around information that is not (X, y)", is
there any plan to have transform being able to transform X and y ?
This would ease lots of problems like subsampling, resampling or
masking data when too incomplete. In my case for example, while
transforming words to vectors, I may end with sentences full of out
of vocabulary words, hence some sample I would like to let aside,
but can't because I do not have hands on y. (and introducing it,
make me loose my ability to use my precious pipeline). I think
Python offers possibilities to handle the API change (for example we
can have a new transform_xy method, and a compatibility transform
using it until deprecation)

Also I understand that changing the API is always a big deal. But I
think scikit-learn, because of its API has played a good role in
standardizing the python ML ecosystem and this is a key contribution.
Not dealing with mature new needs and some of actual API initial flaws,
may deserve whole community as new independent and inconsistent API will
flourish as no project has the legitimity of scikit-learn. So courage :-)

Also having good integrations to popular framework like keras or gensim,
would be great (but the goal of third party packages of course).

Of course writing all this, I don't want to sonud pedantic. I know I'm
not so experimented with scikit-learn (nor did contribute to it), so
take for what it is.

Have a good day !

Alex

-- 
Alexandre Garel
tel : +33 7 68 52 69 07 / +213 656 11 85 10
skype: alexgarel / ring: ba0435e11af36e32e9b4eb13c19c52fd75c7b4b0



signature.asc
Description: OpenPGP digital signature
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Andreas Mueller



On 09/28/2018 04:45 PM, Javier López wrote:



On Fri, Sep 28, 2018 at 8:46 PM Andreas Mueller > wrote:


Basically what you're saying is that you're fine with versioning the
models and having the model break loudly if anything changes.
That's not actually what most people want. They want to be able to
make
predictions with a given model for ever into the future.


Are we talking about "(the new version of) the old model can still 
make predictions" or "the old model makes exactly the same predictions 
as before"? I'd like the first to hold, don't care that much about the 
second.

The second.


We're now storing the version of scikit-learn that was used in the
pickle and warn if you're trying to load with a different version.


This is not the whole truth. Yes, you store the sklearn version on the 
pickle and raise a warning; I am mostly ok with that, but the pickles 
are brittle and oftentimes they stop loading when other versions of 
other stuff change. I am not talking about "Warning: wrong version", 
but rather "Unpickling error: expected bytes, found tuple" that 
prevent the file from loading entirely.

Can you give examples of that? That shouldn't really happen afaik.


That's basically a stricter test than what you wanted. Yes, there are
false positives, but given that this release took a year,
this doesn't seem that big an issue?


1. Things in the current state break when something else changes, not 
only sklearn.

2. Sharing pickles is a bad practice due to a number of reasons.
3. We might want to explore model parameters without having to load 
the entire runtime



I agree, it would be great to have something other than pickle, but as I 
said, the usual request is "I want a way for a model to make the same 
predictions in the future".
If you have a way to do that with a text-based format that doesn't 
require writing lots of version converters I'd be very happy.


Generally, what you want is not to store the model but to store the 
prediction function, and have separate runtimes for training and prediction.
It might not be possible to represent a model from a previous version of 
scikit-learn in a newer version.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Javier López
On Fri, Sep 28, 2018 at 8:46 PM Andreas Mueller  wrote:

> Basically what you're saying is that you're fine with versioning the
> models and having the model break loudly if anything changes.
> That's not actually what most people want. They want to be able to make
> predictions with a given model for ever into the future.
>

Are we talking about "(the new version of) the old model can still make
predictions" or "the old model makes exactly the same predictions as
before"? I'd like the first to hold, don't care that much about the second.


>
> Your use-case is similar, but if retraining the model is not an issue,
> why don't you want to retrain every time scikit-learn releases a new
> version?
>

Thousands of models. I don't want to retrain ALL of them unless needed


> We're now storing the version of scikit-learn that was used in the
> pickle and warn if you're trying to load with a different version.


This is not the whole truth. Yes, you store the sklearn version on the
pickle and raise a warning; I am mostly ok with that, but the pickles are
brittle and oftentimes they stop loading when other versions of other stuff
change. I am not talking about "Warning: wrong version", but rather
"Unpickling error: expected bytes, found tuple" that prevent the file from
loading entirely.


> That's basically a stricter test than what you wanted. Yes, there are
> false positives, but given that this release took a year,
> this doesn't seem that big an issue?
>

1. Things in the current state break when something else changes, not only
sklearn.
2. Sharing pickles is a bad practice due to a number of reasons.
3. We might want to explore model parameters without having to load the
entire runtime

Also, in order to retrain the model we need to keep the whole model
description with parameters. This needs to be saved somewhere, which in the
current state would force us to keep two files: one with the parameters (in
a text format to avoid the "non-loadng" problems from above) and the pkl
with the fitted model. My proposal would keep both in a single file.

As mentioned in previous emails, we already have our own solution that
kind-of-works for our needs, but we have to do a few hackish things to keep
things running. If sklearn estimators simply included a text serialization
method (similar in spirit to the one used for __display__ or __repr__) it
would make things easier.

But I understand that not everyone's needs are the same, so if you guys
don't consider this type of thing a priority, we can live with that :) I
mostly mentioned it since "Backwards-compatible de/serialization of some
estimators" is listed in the roadmap as a desirable goal for version 1.0
and feedback on such roadmap was requested.

J
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Andreas Mueller



On 09/28/2018 03:20 PM, Javier López wrote:
I understand the difficulty of the situation, but an approximate 
solution to that is saving the predictions from a large enough 
validation set. If the prediction for the newly created model are 
"close enough" to the old ones, we deem the unserialized model to be 
the same and move forward, if there are serious discrepancies, then we 
dive deep to see what's going on, and if needed refit the offending 
submodels with the newer version.


Basically what you're saying is that you're fine with versioning the 
models and having the model break loudly if anything changes.
That's not actually what most people want. They want to be able to make 
predictions with a given model for ever into the future.


Your use-case is similar, but if retraining the model is not an issue, 
why don't you want to retrain every time scikit-learn releases a new 
version?
We're now storing the version of scikit-learn that was used in the 
pickle and warn if you're trying to load with a different version.
That's basically a stricter test than what you wanted. Yes, there are 
false positives, but given that this release took a year,

this doesn't seem that big an issue?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Javier López
On Fri, Sep 28, 2018 at 6:41 PM Andreas Mueller  wrote:

> Javier:
> The problem is not so much storing the "model" but storing how to make
> predictions. Different versions could act differently
> on the same data structure - and the data structure could change. Both
> happen in scikit-learn.
> So if you want to make sure the right thing happens across versions, you
> either need to provide serialization and deserialization for
> every version and conversion between those or you need to provide a way
> to store the prediction function,
> which basically means you need a turing-complete language (that's what
> ONNX does).
>

I understand the difficulty of the situation, but an approximate solution
to that is saving the predictions from a large enough validation set. If
the prediction for the newly created model are "close enough" to the old
ones, we deem the unserialized model to be the same and move forward, if
there are serious discrepancies, then we dive deep to see what's going on,
and if needed refit the offending submodels with the newer version.
Since we only want to compare the predictions here, we don't need a ground
truth and thus the validation set doesn't even need to be a real dataset,
it can consist of synthetic datapoints created via SMOTE, Caruana's MUNGE
algorithm, or any other method, and can be made arbitrarily large on in
advance.

This method has worked reasonably well for us in practice; we deal with
ensembles containing
hundreds or thousands of models, and this technique saves us from having to
refit many of them that don't change very often, and if something changes a
lot, we want to know in either case to ascertain what was amiss (either
with the old version or with the new one).

The situation I am proposing is not worse than what we have right now,
which is save a pickle and then hope that it can be read later on;
sometimes it can, sometimes it cannot depending on what changed. Stuff
unrelated to the models themselves, such as changes in the joblib dump
method broke several of our pickles files in the past. What I would like to
have is a text-based representation of the fitted model that can always be
read, stored in a database, or sent over the wire through simple methods.

J
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Manuel CASTEJÓN LIMAS via scikit-learn
How about a docker based approach? Just thinking out loud
Best
Manuel

El vie., 28 sept. 2018 19:43, Andreas Mueller  escribió:

>
>
> On 09/28/2018 01:38 PM, Andreas Mueller wrote:
> >
> >
> > On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
>  I think model serialization should be a priority.
> >>> There is also the ONNX specification that is gaining industrial
> >>> adoption and that already includes open source exporters for several
> >>> families of scikit-learn models:
> >>>
> >>> https://github.com/onnx/onnxmltools
> >>
> >> Didn't know about that. This is really nice! What do you think about
> >> referring to it under
> >> http://scikit-learn.org/stable/modules/model_persistence.html to make
> >> people aware that this option exists?
> >> Would be happy to add a PR.
> >>
> >>
> > I don't think an open source runtime has been announced yet (or they
> > didn't email me like they promised lol).
> > I'm quite excited about this as well.
> >
> > Javier:
> > The problem is not so much storing the "model" but storing how to make
> > predictions. Different versions could act differently
> > on the same data structure - and the data structure could change. Both
> > happen in scikit-learn.
> > So if you want to make sure the right thing happens across versions,
> > you either need to provide serialization and deserialization for
> > every version and conversion between those or you need to provide a
> > way to store the prediction function,
> > which basically means you need a turing-complete language (that's what
> > ONNX does).
> >
> > We basically said doing the first is not feasible within scikit-learn
> > given our current amount of resources, and no-one
> > has even tried doing it outside of scikit-learn (which would be
> > possible).
> > Implementing a complete prediction serialization language (the second
> > option) is definitely outside the scope of sklearn.
> >
> >
> Maybe we should add to the FAQ why serialization is hard?
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Andreas Mueller




On 09/28/2018 01:38 PM, Andreas Mueller wrote:



On 09/28/2018 12:10 PM, Sebastian Raschka wrote:

I think model serialization should be a priority.
There is also the ONNX specification that is gaining industrial 
adoption and that already includes open source exporters for several 
families of scikit-learn models:


https://github.com/onnx/onnxmltools


Didn't know about that. This is really nice! What do you think about 
referring to it under 
http://scikit-learn.org/stable/modules/model_persistence.html to make 
people aware that this option exists?

Would be happy to add a PR.


I don't think an open source runtime has been announced yet (or they 
didn't email me like they promised lol).

I'm quite excited about this as well.

Javier:
The problem is not so much storing the "model" but storing how to make 
predictions. Different versions could act differently
on the same data structure - and the data structure could change. Both 
happen in scikit-learn.
So if you want to make sure the right thing happens across versions, 
you either need to provide serialization and deserialization for
every version and conversion between those or you need to provide a 
way to store the prediction function,
which basically means you need a turing-complete language (that's what 
ONNX does).


We basically said doing the first is not feasible within scikit-learn 
given our current amount of resources, and no-one
has even tried doing it outside of scikit-learn (which would be 
possible).
Implementing a complete prediction serialization language (the second 
option) is definitely outside the scope of sklearn.




Maybe we should add to the FAQ why serialization is hard?
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Andreas Mueller




On 09/28/2018 12:10 PM, Sebastian Raschka wrote:

I think model serialization should be a priority.

There is also the ONNX specification that is gaining industrial adoption and 
that already includes open source exporters for several families of 
scikit-learn models:

https://github.com/onnx/onnxmltools


Didn't know about that. This is really nice! What do you think about referring 
to it under http://scikit-learn.org/stable/modules/model_persistence.html to 
make people aware that this option exists?
Would be happy to add a PR.


I don't think an open source runtime has been announced yet (or they 
didn't email me like they promised lol).

I'm quite excited about this as well.

Javier:
The problem is not so much storing the "model" but storing how to make 
predictions. Different versions could act differently
on the same data structure - and the data structure could change. Both 
happen in scikit-learn.
So if you want to make sure the right thing happens across versions, you 
either need to provide serialization and deserialization for
every version and conversion between those or you need to provide a way 
to store the prediction function,
which basically means you need a turing-complete language (that's what 
ONNX does).


We basically said doing the first is not feasible within scikit-learn 
given our current amount of resources, and no-one

has even tried doing it outside of scikit-learn (which would be possible).
Implementing a complete prediction serialization language (the second 
option) is definitely outside the scope of sklearn.



___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Sebastian Raschka
> 
> > I think model serialization should be a priority.
> 
> There is also the ONNX specification that is gaining industrial adoption and 
> that already includes open source exporters for several families of 
> scikit-learn models:
> 
> https://github.com/onnx/onnxmltools


Didn't know about that. This is really nice! What do you think about referring 
to it under http://scikit-learn.org/stable/modules/model_persistence.html to 
make people aware that this option exists?
Would be happy to add a PR.

Best,
Sebastian



> On Sep 28, 2018, at 9:30 AM, Olivier Grisel  wrote:
> 
> 
> > I think model serialization should be a priority.
> 
> There is also the ONNX specification that is gaining industrial adoption and 
> that already includes open source exporters for several families of 
> scikit-learn models:
> 
> https://github.com/onnx/onnxmltools
> 
> -- 
> Olivier
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Manuel CASTEJÓN LIMAS via scikit-learn
Huge huge Thank you developers!
Keep up the good work!

El mié., 26 sept. 2018 20:57, Andreas Mueller  escribió:

> Hey everbody!
> I'm happy to (finally) announce scikit-learn 0.20.0.
> This release is dedicated to the memory of Raghav Rajagopalan.
>
> You can upgrade now with pip or conda!
>
> There is many important additions and updates, and you can find the full
> release notes here:
> http://scikit-learn.org/stable/whats_new.html#version-0-20
>
> My personal highlights are the ColumnTransformer and the changes to
> OneHotEncoder,
> but there's so much more!
>
> An important note is that this is the last version to support Python2.7,
> and the
> next release will require Python 3.5.
>
> A big thank you to everybody who contributed and special thanks to Joel!
>
> All the best,
> Andy
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Olivier Grisel
>
>
> > I think model serialization should be a priority.
>

There is also the ONNX specification that is gaining industrial adoption
and that already includes open source exporters for several families of
scikit-learn models:

https://github.com/onnx/onnxmltools

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-28 Thread Javier López
On Fri, Sep 28, 2018 at 1:03 AM Sebastian Raschka 
wrote:

> Chris Emmery, Chris Wagner and I toyed around with JSON a while back (
> https://cmry.github.io/notes/serialize), and it could be feasible


I came across your notes a while back, they were really useful!
I hacked a variation of it that didn't need to know the model class in
advance:
https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
but is is VERY hackish, and it doesn't work with complex models with nested
components. (At work we use a further variation of this that also works on
pipelines and some specific nested stuff, like `mlxtend`'s
`SequentialFeatureSelector`)


> but yeah, it will involve some work, especially with testing things
> thoroughly for all kinds of estimators. Maybe this could somehow be
> automated though in a grid-search kind of way with a build matrix for
> estimators and parameters once a general framework has been developed.
>

I considered making this serialization into an external project, but I
think this would be much easier if estimators provided a dunder method
`__serialize__` (or whatever) that would handle the idiosyncrasies of each
particular family, I don't believe there will be a "one-size-fits-all"
solution for this problem. This approach would also make it possible to
work on it incrementally, raising a default `NotImplementedError` for
estimators that haven't been addressed yet.

In the long run, I also believe that the "proper" way to do this is to
allow dumping entire processes into PFA: http://dmg.org/pfa/docs/motivation/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Sebastian Raschka
Congrats everyone, this is awesome!!! I just started teaching an ML course this 
semester and introduced scikit-learn this week -- it was a great timing to 
demonstrate how well maintained the library is and praise all the efforts that 
go into it :). 

> I think model serialization should be a priority.


While this could potentially a bit inefficient for large non-parametric models, 
I think the serialization into a text-readable format has some advantages for 
real-world use cases. E.g., sharing models (pickle is a bit problematic because 
of security issues) in applications but also as supplementary material in 
archives for accompanying research articles, etc (esp in cases where datasets 
cannot be shared in their original form due to some copyright or other 
concerns).

Chris Emmery, Chris Wagner and I toyed around with JSON a while back 
(https://cmry.github.io/notes/serialize), and it could be feasible -- but yeah, 
it will involve some work, especially with testing things thoroughly for all 
kinds of estimators. Maybe this could somehow be automated though in a 
grid-search kind of way with a build matrix for estimators and parameters once 
a general framework has been developed. 


> On Sep 27, 2018, at 6:22 PM, Javier López  wrote:
> 
> First of all, congratulations on the release, great work, everyone!
> 
> I think model serialization should be a priority. Particularly, 
> I think that (whenever practical) there should be a way of 
> serializing estimators (either unfitted or fitted) in a text-readable format,
> prefereably JSON or PMML/PFA (or several others).
> 
> Obviously for some models it is not practical (eg random forests with 
> thousands of trees), but for simpler situations I believe it would
> provide a great tool for model sharing without the dangers of pickling
> and the versioning hell.
> 
> I am (painfully) aware that when rebuilding a model on a different setup,
> it might yield different results; in my company we address that by saving
> together with the serialized model a reasonably small validation dataset
> together with its predictions, upon unserializing we check that the rebuilt
> model reproduces the predictions within some acceptable range. 
> 
> About the new release, I am particularly happy about the joblib update,
> as it has been a major source of pain for me over the last year. On that
> note, I think it would be a good idea to stop vendoring joblib and list it as
> a dependency instead; wheels, pip and conda are mature enough to 
> handle the situation nowadays.
> 
> Last, but not least, it would be great to relax the checks concerning nans 
> at prediction time, and allow, for instance, that an estimator yields nans if
> any features are nan's; we face that situation when working with ensembles,
> where a few of the submodels might not get enough features available, but
> the rest do.  
> 
> Of the top of my head, that's all, keep up the fantastic work!
> J
> 
> On Thu, Sep 27, 2018 at 6:31 PM Andreas Mueller  wrote:
> I think we should work on the formatting, make sure it's complete, link it to 
> issues /PRs and
> then make this into a public document on the website and request feedback.
> 
> Right now it's a bit in a format that is understandable for core-developers 
> but some of the things are not clear
> to the average audience. Linking the issues / PRs will help that a bit, but 
> also we might want to add a sentence
> to each point in the roadmap.
> 
> I had some issues with the formatting, I'll try to fix that later.
> Any volunteers for adding the frozen estimator (or has someone added that 
> already?).
> 
> Cheers,
> Andy
> 
> 
> On 09/27/2018 04:29 AM, Olivier Grisel wrote:
>> Le mer. 26 sept. 2018 à 23:02, Joel Nothman  a écrit 
>> :
>> And for those interested in what's in the pipeline, we are trying to draft a 
>> roadmap... 
>> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>> 
>> But there are no doubt many features that are absent there too.
>> 
>> Indeed, it would be great to get some feedback on this roadmap from heavy 
>> scikit-learn users: which points do you think are the most important? What 
>> is missing from this roadmap?
>> 
>> Feel free to reply to this thread.
>> 
>> -- 
>> Olivier
>> 
>> 
>> ___
>> scikit-learn mailing list
>> 
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Javier López
First of all, congratulations on the release, great work, everyone!

I think model serialization should be a priority. Particularly,
I think that (whenever practical) there should be a way of
serializing estimators (either unfitted or fitted) in a text-readable
format,
prefereably JSON or PMML/PFA (or several others).

Obviously for some models it is not practical (eg random forests with
thousands of trees), but for simpler situations I believe it would
provide a great tool for model sharing without the dangers of pickling
and the versioning hell.

I am (painfully) aware that when rebuilding a model on a different setup,
it might yield different results; in my company we address that by saving
together with the serialized model a reasonably small validation dataset
together with its predictions, upon unserializing we check that the rebuilt
model reproduces the predictions within some acceptable range.

About the new release, I am particularly happy about the joblib update,
as it has been a major source of pain for me over the last year. On that
note, I think it would be a good idea to stop vendoring joblib and list it
as
a dependency instead; wheels, pip and conda are mature enough to
handle the situation nowadays.

Last, but not least, it would be great to relax the checks concerning nans
at prediction time, and allow, for instance, that an estimator yields nans
if
any features are nan's; we face that situation when working with ensembles,
where a few of the submodels might not get enough features available, but
the rest do.

Of the top of my head, that's all, keep up the fantastic work!
J

On Thu, Sep 27, 2018 at 6:31 PM Andreas Mueller  wrote:

> I think we should work on the formatting, make sure it's complete, link it
> to issues /PRs and
> then make this into a public document on the website and request feedback.
>
> Right now it's a bit in a format that is understandable for
> core-developers but some of the things are not clear
> to the average audience. Linking the issues / PRs will help that a bit,
> but also we might want to add a sentence
> to each point in the roadmap.
>
> I had some issues with the formatting, I'll try to fix that later.
> Any volunteers for adding the frozen estimator (or has someone added that
> already?).
>
> Cheers,
> Andy
>
>
> On 09/27/2018 04:29 AM, Olivier Grisel wrote:
>
> Le mer. 26 sept. 2018 à 23:02, Joel Nothman  a
> écrit :
>
>> And for those interested in what's in the pipeline, we are trying to
>> draft a roadmap...
>> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>>
>> But there are no doubt many features that are absent there too.
>>
>
> Indeed, it would be great to get some feedback on this roadmap from heavy
> scikit-learn users: which points do you think are the most important? What
> is missing from this roadmap?
>
> Feel free to reply to this thread.
>
> --
> Olivier
>
>
> ___
> scikit-learn mailing 
> listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Andreas Mueller
I think we should work on the formatting, make sure it's complete, link 
it to issues /PRs and

then make this into a public document on the website and request feedback.

Right now it's a bit in a format that is understandable for 
core-developers but some of the things are not clear
to the average audience. Linking the issues / PRs will help that a bit, 
but also we might want to add a sentence

to each point in the roadmap.

I had some issues with the formatting, I'll try to fix that later.
Any volunteers for adding the frozen estimator (or has someone added 
that already?).


Cheers,
Andy

On 09/27/2018 04:29 AM, Olivier Grisel wrote:
Le mer. 26 sept. 2018 à 23:02, Joel Nothman > a écrit :


And for those interested in what's in the pipeline, we are trying
to draft a roadmap...
https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018

But there are no doubt many features that are absent there too.


Indeed, it would be great to get some feedback on this roadmap from 
heavy scikit-learn users: which points do you think are the most 
important? What is missing from this roadmap?


Feel free to reply to this thread.

--
Olivier


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Olivier Grisel
Le mer. 26 sept. 2018 à 23:02, Joel Nothman  a
écrit :

> And for those interested in what's in the pipeline, we are trying to draft
> a roadmap...
> https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018
>
> But there are no doubt many features that are absent there too.
>

Indeed, it would be great to get some feedback on this roadmap from heavy
scikit-learn users: which points do you think are the most important? What
is missing from this roadmap?

Feel free to reply to this thread.

-- 
Olivier
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-27 Thread Olivier Grisel
Joy !
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Aiden Nguyen
Congrat all team!
Aiden Nguyen

--
Nguyen Thien Bao, PhD
Director and Founder, HBB Tech, Vietnam
Co-founder, HBB Solutions, Vietnam
Head, R&D Division, Cardano Labo, Vietnam
NeuroInformatics Laboratory (NILab), Fondazione Bruno Kessler (FBK),
Trento, Italy
Centro Interdipartimentale Mente e Cervello (CIMeC), Universita degli Studi
di Trento, Italy
Surgical Planning Laboratory (SPL), Department of Radiology, BWH, Harvard
University, MA, USA
Lecturer, Faculty of Information Technology, University of Technology and
Education, Ho Chi Minh, Vietnam
Email: bao at bwh.harvard.edu or tbnguyen at fbk.eu or baont at
hbbsolution.com or ntbaovn at gmail.com
Fax: +39.0461.283.091
Cellphone:   +1. 857.265.6408 (USA)
 +39.345.293.1006 (Italy)
 +84.9.2761.3761 (VietNam)


On Thu, Sep 27, 2018 at 12:49 PM Denis-Alexander Engemann <
denis.engem...@gmail.com> wrote:

> This is wonderful news! Congrats everyone. I can‘t wait to check out the
> game changing column transformer!
>
> Denis
> On Wed 26 Sep 2018 at 23:45, Gael Varoquaux 
> wrote:
>
>> Hurray, thanks to everybody; in particular for those who did the hard
>> work of ironing out the last issues and releasing.
>>
>> Gaël
>>
>> On Wed, Sep 26, 2018 at 02:55:57PM -0400, Andreas Mueller wrote:
>> > Hey everbody!
>> > I'm happy to (finally) announce scikit-learn 0.20.0.
>> > This release is dedicated to the memory of Raghav Rajagopalan.
>>
>> > You can upgrade now with pip or conda!
>>
>> > There is many important additions and updates, and you can find the full
>> > release notes here:
>> > http://scikit-learn.org/stable/whats_new.html#version-0-20
>>
>> > My personal highlights are the ColumnTransformer and the changes to
>> > OneHotEncoder,
>> > but there's so much more!
>>
>> > An important note is that this is the last version to support
>> Python2.7, and
>> > the
>> > next release will require Python 3.5.
>>
>> > A big thank you to everybody who contributed and special thanks to Joel!
>>
>> > All the best,
>> > Andy
>> > ___
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> --
>> Gael Varoquaux
>> Senior Researcher, INRIA Parietal
>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>> Phone:  ++ 33-1-69-08-79-68
>> http://gael-varoquaux.info
>> http://twitter.com/GaelVaroquaux
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Denis-Alexander Engemann
This is wonderful news! Congrats everyone. I can‘t wait to check out the
game changing column transformer!

Denis
On Wed 26 Sep 2018 at 23:45, Gael Varoquaux 
wrote:

> Hurray, thanks to everybody; in particular for those who did the hard
> work of ironing out the last issues and releasing.
>
> Gaël
>
> On Wed, Sep 26, 2018 at 02:55:57PM -0400, Andreas Mueller wrote:
> > Hey everbody!
> > I'm happy to (finally) announce scikit-learn 0.20.0.
> > This release is dedicated to the memory of Raghav Rajagopalan.
>
> > You can upgrade now with pip or conda!
>
> > There is many important additions and updates, and you can find the full
> > release notes here:
> > http://scikit-learn.org/stable/whats_new.html#version-0-20
>
> > My personal highlights are the ColumnTransformer and the changes to
> > OneHotEncoder,
> > but there's so much more!
>
> > An important note is that this is the last version to support Python2.7,
> and
> > the
> > next release will require Python 3.5.
>
> > A big thank you to everybody who contributed and special thanks to Joel!
>
> > All the best,
> > Andy
> > ___
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> --
> Gael Varoquaux
> Senior Researcher, INRIA Parietal
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone:  ++ 33-1-69-08-79-68
> http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Gael Varoquaux
Hurray, thanks to everybody; in particular for those who did the hard
work of ironing out the last issues and releasing.

Gaël

On Wed, Sep 26, 2018 at 02:55:57PM -0400, Andreas Mueller wrote:
> Hey everbody!
> I'm happy to (finally) announce scikit-learn 0.20.0.
> This release is dedicated to the memory of Raghav Rajagopalan.

> You can upgrade now with pip or conda!

> There is many important additions and updates, and you can find the full
> release notes here:
> http://scikit-learn.org/stable/whats_new.html#version-0-20

> My personal highlights are the ColumnTransformer and the changes to
> OneHotEncoder,
> but there's so much more!

> An important note is that this is the last version to support Python2.7, and
> the
> next release will require Python 3.5.

> A big thank you to everybody who contributed and special thanks to Joel!

> All the best,
> Andy
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
Gael Varoquaux
Senior Researcher, INRIA Parietal
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone:  ++ 33-1-69-08-79-68
http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Joel Nothman
And for those interested in what's in the pipeline, we are trying to draft
a roadmap...
https://github.com/scikit-learn/scikit-learn/wiki/Draft-Roadmap-2018

But there are no doubt many features that are absent there too.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Andreas Mueller




On 09/26/2018 04:49 PM, Joel Nothman wrote:
Wow. It's finally out!! Thank you to the cast of thousands, but to 
also some individuals for real dedication and insight!


Yet there's so much more still in the pipeline. If we're clever about 
things, we'll make the next release cycle shorter and the release more 
manageable.



There's always so much more :)
And yes, we should strive to cut down our release cycle (significantly). 
Let's see if we manage.

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Joel Nothman
Wow. It's finally out!! Thank you to the cast of thousands, but to also
some individuals for real dedication and insight!

Yet there's so much more still in the pipeline. If we're clever about
things, we'll make the next release cycle shorter and the release more
manageable.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Raga Markely
Congratulations!

Thank you very much for everyone's hard work!
Raga

On Wed, Sep 26, 2018, 2:57 PM Andreas Mueller  wrote:

> Hey everbody!
> I'm happy to (finally) announce scikit-learn 0.20.0.
> This release is dedicated to the memory of Raghav Rajagopalan.
>
> You can upgrade now with pip or conda!
>
> There is many important additions and updates, and you can find the full
> release notes here:
> http://scikit-learn.org/stable/whats_new.html#version-0-20
>
> My personal highlights are the ColumnTransformer and the changes to
> OneHotEncoder,
> but there's so much more!
>
> An important note is that this is the last version to support Python2.7,
> and the
> next release will require Python 3.5.
>
> A big thank you to everybody who contributed and special thanks to Joel!
>
> All the best,
> Andy
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread bthirion

Congratulations !
Bertrand

On 26/09/2018 20:55, Andreas Mueller wrote:

Hey everbody!
I'm happy to (finally) announce scikit-learn 0.20.0.
This release is dedicated to the memory of Raghav Rajagopalan.

You can upgrade now with pip or conda!

There is many important additions and updates, and you can find the full
release notes here:
http://scikit-learn.org/stable/whats_new.html#version-0-20

My personal highlights are the ColumnTransformer and the changes to 
OneHotEncoder,

but there's so much more!

An important note is that this is the last version to support 
Python2.7, and the

next release will require Python 3.5.

A big thank you to everybody who contributed and special thanks to Joel!

All the best,
Andy
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn



___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] [ANN] Scikit-learn 0.20.0

2018-09-26 Thread Andreas Mueller

Hey everbody!
I'm happy to (finally) announce scikit-learn 0.20.0.
This release is dedicated to the memory of Raghav Rajagopalan.

You can upgrade now with pip or conda!

There is many important additions and updates, and you can find the full
release notes here:
http://scikit-learn.org/stable/whats_new.html#version-0-20

My personal highlights are the ColumnTransformer and the changes to 
OneHotEncoder,

but there's so much more!

An important note is that this is the last version to support Python2.7, 
and the

next release will require Python 3.5.

A big thank you to everybody who contributed and special thanks to Joel!

All the best,
Andy
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn