I think what was not clear from the question is that there is actually quite different kinds of plotting functions, and many of these are tied to existing code.

Right now we have some that are specific to trees (plot_tree) and to gradient boosting (plot_partial_dependence).

I think we want more general functions, and plot_partial_dependence has been extended to general estimators.

However, the plotting functions might be generic wrt the estimator, but relate to a specific function, say plotting results of GridSearchCV. Then one might argue that having the plotting function close to GridSearchCV might make sense. Similarly for plotting partial dependence plots and feature importances, it might be a bit strange to have the plotting functions not next to the functions that compute these. Another question would be is whether the plotting functions also "do the work" in some cases: Do we want plot_partial_dependence also to compute the partial dependence? (I would argue yes but either way the result is a bit strange). In that case you have somewhat of the same functionality in two different modules, unless you also put the "compute partial dependence" function in the plotting module as well,
which is a bit strange.

Maybe we could inform this discussion by listing candidate plotting functions, and also considering whether they "do the work" and where the "work" function is.

Other examples are plotting the confusion matrix, which probably should also compute the confusion matrix (it's fast and so that would be convenient), and so it would "duplicate" functionality from the metrics module.

Plotting learning curves and validation curves should probably not do the work as it's pretty involved, and so someone would need to import the learning and validation curves from model selection, and then the plotting functions from a plotting module.

Calibrations curves and P/R curves and roc curves are also pretty fast to compute (and passing around the arguments is somewhat error prone) so I would say the plotting functions for these should do the work as well.

Anyway, you can see that many plotting functions are actually associated with functions in existing modules and the interactions are a bit unclear.

The only plotting functions I haven't mentioned so far that I thought about in the past are "2d scatter" and "plot decision function". These would be kind of generic, but mostly used in the examples. Though having a discrete 2d scatter function would be pretty nice (plt.scatter doesn't allow legends and makes it hard to use qualitative color maps).


I think I would vote for option (1), "sklearn.plot.plot_zzz" but the case is not really that clear.

Cheers,

Andy

On 4/3/19 7:35 AM, Roman Yurchak via scikit-learn wrote:
+1 for options 1 and +0.5 for 3. Do we anticipate that many plotting
functions will be added? If it's just a dozen or less, putting them all
into a single namespace sklearn.plot might be easier.

This also would avoid discussion about where to put some generic
plotting functions (e.g.
https://github.com/scikit-learn/scikit-learn/issues/13448#issuecomment-478341479).

Roman

On 03/04/2019 12:06, Trevor Stephens wrote:
I think #1 if any of these... Plotting functions should hopefully be as
general as possible, so tagging with a specific type of estimator will,
in some scikit-learn utopia, be unnecessary.

If a general plotter is built, where does it live in other
estimator-specific namespace options? Feels awkward to put it under
every estimator's namespace.

Then again, there might be a #4 where there is no plot module and
plotting classes live under groups of utilities like introspection,
cross-validation or something?...

On Wed, Apr 3, 2019 at 8:54 PM Andrew Howe <ahow...@gmail.com
<mailto:ahow...@gmail.com>> wrote:

     My preference would be for (1). I don't think the sub-namespace in
     (2) is necessary, and don't like (3), as I would prefer the plotting
     functions to be all in the same namespace sklearn.plot.

     Andrew

     <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
     J. Andrew Howe, PhD
     LinkedIn Profile <http://www.linkedin.com/in/ahowe42>
     ResearchGate Profile <http://www.researchgate.net/profile/John_Howe12/>
     Open Researcher and Contributor ID (ORCID)
     <http://orcid.org/0000-0002-3553-1990>
     Github Profile <http://github.com/ahowe42>
     Personal Website <http://www.andrewhowe.com>
     I live to learn, so I can learn to live. - me
     <~~~~~~~~~~~~~~~~~~~~~~~~~~~>


     On Tue, Apr 2, 2019 at 3:40 PM Hanmin Qin <qinhanmin2...@sina.com
     <mailto:qinhanmin2...@sina.com>> wrote:

         See https://github.com/scikit-learn/scikit-learn/issues/13448

         We've introduced several plotting functions (e.g., plot_tree and
         plot_partial_dependence) and will introduce more (e.g.,
         plot_decision_boundary) in the future. Consequently, we need to
         decide where to put these functions. Currently, there're 3
         proposals:

         (1) sklearn.plot.plot_YYY (e.g., sklearn.plot.plot_tree)

         (2) sklearn.plot.XXX.plot_YYY (e.g., sklearn.plot.tree.plot_tree)

         (3) sklearn.XXX.plot.plot_YYY (e.g.,
         sklearn.tree.plot.plot_tree, note that we won't support from
         sklearn.XXX import plot_YYY)

         Joel Nothman, Gael Varoquaux and I decided to post it on the
         mailing list to invite opinions.

         Thanks

         Hanmin Qin
         _______________________________________________
         scikit-learn mailing list
         scikit-learn@python.org <mailto:scikit-learn@python.org>
         https://mail.python.org/mailman/listinfo/scikit-learn

     _______________________________________________
     scikit-learn mailing list
     scikit-learn@python.org <mailto:scikit-learn@python.org>
     https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to