I'm with Andreas on this. As a user, I would prefer to see this as part of sklearn with the usual sklearn api. If we want static matplotlib-style images, reusing (with credit) some of the yellowbrick implementations is a good idea.
Would we consider plotly-based visualizations? I've been doing my own plotting in plotly for the last month, and can't imagine going back to static matplotlib plots... Andrew <~~~~~~~~~~~~~~~~~~~~~~~~~~~> J. Andrew Howe, PhD LinkedIn Profile <http://www.linkedin.com/in/ahowe42> ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/> Open Researcher and Contributor ID (ORCID) <http://orcid.org/0000-0002-3553-1990> Github Profile <http://github.com/ahowe42> Personal Website <http://www.andrewhowe.com> I live to learn, so I can learn to live. - me <~~~~~~~~~~~~~~~~~~~~~~~~~~~> On Thu, Apr 4, 2019 at 3:26 PM Andreas Mueller <t3k...@gmail.com> wrote: > I would argue that sklearn users would benefit in having solutions in > scikit-learn. The yellowbrick api is quite different from the approaches we > discussed. If we can reuse their implementations I think we should do so > and credit where we can. > Having plotting in sklearn is also likely to attract more contributors and > we have more eyes for doing reviews. > > Sent from phone. Please excuse spelling and brevity. > > On Thu, Apr 4, 2019, 05:43 Alexandre Gramfort <alexandre.gramf...@inria.fr> > wrote: > >> I also think that YellowBrick folks did a great job and that we should >> not reinvent the wheel or at least have clear idea of how we differ in >> scope with respect to YellowBrick >> >> my 2c >> >> Alex >> >> >> On Thu, Apr 4, 2019 at 1:02 AM Eric Ma <ericmajingl...@gmail.com> wrote: >> >>> This is not a strongly-held suggestion - but what about adopting >>> YellowBrick as the plotting API for sklearn? Not sure how exactly the >>> interaction would work - could be PRs to their library, or ask them to >>> integrate into sklearn, or do a lock-step dance with versions but maintain >>> separate teams? (I know it raises more questions than answers, but wanted >>> to put it out there.) >>> >>> On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.noth...@gmail.com> >>> wrote: >>> >>>> With option 1, sklearn.plot is likely to import large chunks of the >>>> library (particularly, but not exclusively, if the plotting function >>>> "does the work" as Andy suggests). This is under the assumption that >>>> one plot function will want to import trees, another GPs, etc. Unless >>>> we move to lazy imports, that would be against the current convention >>>> that importing sklearn is fairly minimal. >>>> >>>> I do like Andy's idea of framing this discussion more clearly around >>>> likely candidates. >>>> >>>> On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3k...@gmail.com> wrote: >>>> > >>>> > I think what was not clear from the question is that there is actually >>>> > quite different kinds of plotting functions, and many of these are >>>> tied >>>> > to existing code. >>>> > >>>> > Right now we have some that are specific to trees (plot_tree) and to >>>> > gradient boosting (plot_partial_dependence). >>>> > >>>> > I think we want more general functions, and plot_partial_dependence >>>> has >>>> > been extended to general estimators. >>>> > >>>> > However, the plotting functions might be generic wrt the estimator, >>>> but >>>> > relate to a specific function, say plotting results of GridSearchCV. >>>> > Then one might argue that having the plotting function close to >>>> > GridSearchCV might make sense. >>>> > Similarly for plotting partial dependence plots and feature >>>> importances, >>>> > it might be a bit strange to have the plotting functions not next to >>>> the >>>> > functions that compute these. >>>> > Another question would be is whether the plotting functions also "do >>>> the >>>> > work" in some cases: >>>> > Do we want plot_partial_dependence also to compute the partial >>>> > dependence? (I would argue yes but either way the result is a bit >>>> strange). >>>> > In that case you have somewhat of the same functionality in two >>>> > different modules, unless you also put the "compute partial >>>> dependence" >>>> > function in the plotting module as well, >>>> > which is a bit strange. >>>> > >>>> > Maybe we could inform this discussion by listing candidate plotting >>>> > functions, and also considering whether they "do the work" and where >>>> the >>>> > "work" function is. >>>> > >>>> > Other examples are plotting the confusion matrix, which probably >>>> should >>>> > also compute the confusion matrix (it's fast and so that would be >>>> > convenient), and so it would "duplicate" functionality from the >>>> metrics >>>> > module. >>>> > >>>> > Plotting learning curves and validation curves should probably not do >>>> > the work as it's pretty involved, and so someone would need to import >>>> > the learning and validation curves from model selection, and then the >>>> > plotting functions from a plotting module. >>>> > >>>> > Calibrations curves and P/R curves and roc curves are also pretty fast >>>> > to compute (and passing around the arguments is somewhat error prone) >>>> so >>>> > I would say the plotting functions for these should do the work as >>>> well. >>>> > >>>> > Anyway, you can see that many plotting functions are actually >>>> associated >>>> > with functions in existing modules and the interactions are a bit >>>> unclear. >>>> > >>>> > The only plotting functions I haven't mentioned so far that I thought >>>> > about in the past are "2d scatter" and "plot decision function". These >>>> > would be kind of generic, but mostly used in the examples. >>>> > Though having a discrete 2d scatter function would be pretty nice >>>> > (plt.scatter doesn't allow legends and makes it hard to use >>>> qualitative >>>> > color maps). >>>> > >>>> > >>>> > I think I would vote for option (1), "sklearn.plot.plot_zzz" but the >>>> > case is not really that clear. >>>> > >>>> > Cheers, >>>> > >>>> > Andy >>>> > >>>> > On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote: >>>> > > +1 for options 1 and +0.5 for 3. Do we anticipate that many plotting >>>> > > functions will be added? If it's just a dozen or less, putting them >>>> all >>>> > > into a single namespace sklearn.plot might be easier. >>>> > > >>>> > > This also would avoid discussion about where to put some generic >>>> > > plotting functions (e.g. >>>> > > >>>> https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-478341479 >>>> ). >>>> > > >>>> > > Roman >>>> > > >>>> > > On 03/04/2019 12:06, Trevor Stephens wrote: >>>> > >> I think #1 if any of these... Plotting functions should hopefully >>>> be as >>>> > >> general as possible, so tagging with a specific type of estimator >>>> will, >>>> > >> in some scikit-learn utopia, be unnecessary. >>>> > >> >>>> > >> If a general plotter is built, where does it live in other >>>> > >> estimator-specific namespace options? Feels awkward to put it under >>>> > >> every estimator's namespace. >>>> > >> >>>> > >> Then again, there might be a #4 where there is no plot module and >>>> > >> plotting classes live under groups of utilities like introspection, >>>> > >> cross-validation or something?... >>>> > >> >>>> > >> On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahow...@gmail.com >>>> > >> <mailto:ahow...@gmail.com>> wrote: >>>> > >> >>>> > >> My preference would be for (1). I don't think the >>>> sub-namespace in >>>> > >> (2) is necessary, and don't like (3), as I would prefer the >>>> plotting >>>> > >> functions to be all in the same namespace sklearn.plot. >>>> > >> >>>> > >> Andrew >>>> > >> >>>> > >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>>> > >> J. Andrew Howe, PhD >>>> > >> LinkedIn Profile <http://www.linkedin.com/in/ahowe42> >>>> > >> ResearchGate Profile < >>>> http://www.researchgate.net/profile/John_Howe12/> >>>> > >> Open Researcher and Contributor ID (ORCID) >>>> > >> <http://orcid.org/0000-0002-3553-1990> >>>> > >> Github Profile <http://github.com/ahowe42> >>>> > >> Personal Website <http://www.andrewhowe.com> >>>> > >> I live to learn, so I can learn to live. - me >>>> > >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> >>>> > >> >>>> > >> >>>> > >> On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin < >>>> qinhanmin2...@sina.com >>>> > >> <mailto:qinhanmin2...@sina.com>> wrote: >>>> > >> >>>> > >> See >>>> https://github.com/scikit-learn/scikit-learn/issues/13448 >>>> > >> >>>> > >> We've introduced several plotting functions (e.g., >>>> plot_tree and >>>> > >> plot_partial_dependence) and will introduce more (e.g., >>>> > >> plot_decision_boundary) in the future. Consequently, we >>>> need to >>>> > >> decide where to put these functions. Currently, there're 3 >>>> > >> proposals: >>>> > >> >>>> > >> (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree) >>>> > >> >>>> > >> (2) sklearn.plot.XXX.plot_YYY (e.g., >>>> sklearn.plot.tree.plot_tree) >>>> > >> >>>> > >> (3) sklearn.XXX.plot.plot_YYY (e.g., >>>> > >> sklearn.tree.plot.plot_tree, note that we won't support >>>> from >>>> > >> sklearn.XXX import plot_YYY) >>>> > >> >>>> > >> Joel Nothman, Gael Varoquaux and I decided to post it on >>>> the >>>> > >> mailing list to invite opinions. >>>> > >> >>>> > >> Thanks >>>> > >> >>>> > >> Hanmin Qin >>>> > >> _______________________________________________ >>>> > >> scikit-learn mailing list >>>> > >> scikit-learn@python.org <mailto:scikit-learn@python.org> >>>> > >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >> >>>> > >> _______________________________________________ >>>> > >> scikit-learn mailing list >>>> > >> scikit-learn@python.org <mailto:scikit-learn@python.org> >>>> > >> https://mail.python.org/mailman/listinfo/scikit-learn >>>> > >> >>>> > > >>>> > > _______________________________________________ >>>> > > scikit-learn mailing list >>>> > > scikit-learn@python.org >>>> > > https://mail.python.org/mailman/listinfo/scikit-learn >>>> > _______________________________________________ >>>> > scikit-learn mailing list >>>> > scikit-learn@python.org >>>> > https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn