This is not a strongly-held suggestion - but what about adopting YellowBrick as the plotting API for sklearn? Not sure how exactly the interaction would work - could be PRs to their library, or ask them to integrate into sklearn, or do a lock-step dance with versions but maintain separate teams? (I know it raises more questions than answers, but wanted to put it out there.)
On Wed, Apr 3, 2019 at 4:07 PM Joel Nothman <joel.noth...@gmail.com> wrote: > With option 1, sklearn.plot is likely to import large chunks of the > library (particularly, but not exclusively, if the plotting function > "does the work" as Andy suggests). This is under the assumption that > one plot function will want to import trees, another GPs, etc. Unless > we move to lazy imports, that would be against the current convention > that importing sklearn is fairly minimal. > > I do like Andy's idea of framing this discussion more clearly around > likely candidates. > > On Thu, 4 Apr 2019 at 00:10, Andreas Mueller <t3k...@gmail.com> wrote: > > > > I think what was not clear from the question is that there is actually > > quite different kinds of plotting functions, and many of these are tied > > to existing code. > > > > Right now we have some that are specific to trees (plot_tree) and to > > gradient boosting (plot_partial_dependence). > > > > I think we want more general functions, and plot_partial_dependence has > > been extended to general estimators. > > > > However, the plotting functions might be generic wrt the estimator, but > > relate to a specific function, say plotting results of GridSearchCV. > > Then one might argue that having the plotting function close to > > GridSearchCV might make sense. > > Similarly for plotting partial dependence plots and feature importances, > > it might be a bit strange to have the plotting functions not next to the > > functions that compute these. > > Another question would be is whether the plotting functions also "do the > > work" in some cases: > > Do we want plot_partial_dependence also to compute the partial > > dependence? (I would argue yes but either way the result is a bit > strange). > > In that case you have somewhat of the same functionality in two > > different modules, unless you also put the "compute partial dependence" > > function in the plotting module as well, > > which is a bit strange. > > > > Maybe we could inform this discussion by listing candidate plotting > > functions, and also considering whether they "do the work" and where the > > "work" function is. > > > > Other examples are plotting the confusion matrix, which probably should > > also compute the confusion matrix (it's fast and so that would be > > convenient), and so it would "duplicate" functionality from the metrics > > module. > > > > Plotting learning curves and validation curves should probably not do > > the work as it's pretty involved, and so someone would need to import > > the learning and validation curves from model selection, and then the > > plotting functions from a plotting module. > > > > Calibrations curves and P/R curves and roc curves are also pretty fast > > to compute (and passing around the arguments is somewhat error prone) so > > I would say the plotting functions for these should do the work as well. > > > > Anyway, you can see that many plotting functions are actually associated > > with functions in existing modules and the interactions are a bit > unclear. > > > > The only plotting functions I haven't mentioned so far that I thought > > about in the past are "2d scatter" and "plot decision function". These > > would be kind of generic, but mostly used in the examples. > > Though having a discrete 2d scatter function would be pretty nice > > (plt.scatter doesn't allow legends and makes it hard to use qualitative > > color maps). > > > > > > I think I would vote for option (1), "sklearn.plot.plot_zzz" but the > > case is not really that clear. > > > > Cheers, > > > > Andy > > > > On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote: > > > +1 for options 1 and +0.5 for 3. Do we anticipate that many plotting > > > functions will be added? If it's just a dozen or less, putting them all > > > into a single namespace sklearn.plot might be easier. > > > > > > This also would avoid discussion about where to put some generic > > > plotting functions (e.g. > > > > https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-478341479 > ). > > > > > > Roman > > > > > > On 03/04/2019 12:06, Trevor Stephens wrote: > > >> I think #1 if any of these... Plotting functions should hopefully be > as > > >> general as possible, so tagging with a specific type of estimator > will, > > >> in some scikit-learn utopia, be unnecessary. > > >> > > >> If a general plotter is built, where does it live in other > > >> estimator-specific namespace options? Feels awkward to put it under > > >> every estimator's namespace. > > >> > > >> Then again, there might be a #4 where there is no plot module and > > >> plotting classes live under groups of utilities like introspection, > > >> cross-validation or something?... > > >> > > >> On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahow...@gmail.com > > >> <mailto:ahow...@gmail.com>> wrote: > > >> > > >> My preference would be for (1). I don't think the sub-namespace > in > > >> (2) is necessary, and don't like (3), as I would prefer the > plotting > > >> functions to be all in the same namespace sklearn.plot. > > >> > > >> Andrew > > >> > > >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > >> J. Andrew Howe, PhD > > >> LinkedIn Profile <http://www.linkedin.com/in/ahowe42> > > >> ResearchGate Profile < > http://www.researchgate.net/profile/John_Howe12/> > > >> Open Researcher and Contributor ID (ORCID) > > >> <http://orcid.org/0000-0002-3553-1990> > > >> Github Profile <http://github.com/ahowe42> > > >> Personal Website <http://www.andrewhowe.com> > > >> I live to learn, so I can learn to live. - me > > >> <~~~~~~~~~~~~~~~~~~~~~~~~~~~> > > >> > > >> > > >> On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin < > qinhanmin2...@sina.com > > >> <mailto:qinhanmin2...@sina.com>> wrote: > > >> > > >> See > https://github.com/scikit-learn/scikit-learn/issues/13448 > > >> > > >> We've introduced several plotting functions (e.g., plot_tree > and > > >> plot_partial_dependence) and will introduce more (e.g., > > >> plot_decision_boundary) in the future. Consequently, we need > to > > >> decide where to put these functions. Currently, there're 3 > > >> proposals: > > >> > > >> (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree) > > >> > > >> (2) sklearn.plot.XXX.plot_YYY (e.g., > sklearn.plot.tree.plot_tree) > > >> > > >> (3) sklearn.XXX.plot.plot_YYY (e.g., > > >> sklearn.tree.plot.plot_tree, note that we won't support from > > >> sklearn.XXX import plot_YYY) > > >> > > >> Joel Nothman, Gael Varoquaux and I decided to post it on the > > >> mailing list to invite opinions. > > >> > > >> Thanks > > >> > > >> Hanmin Qin > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn@python.org <mailto:scikit-learn@python.org> > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > >> > > >> _______________________________________________ > > >> scikit-learn mailing list > > >> scikit-learn@python.org <mailto:scikit-learn@python.org> > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > >> > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn