[jira] [Commented] (SPARK-6885) Decision trees: predict class probabilities

Joseph K. Bradley (JIRA) Tue, 28 Jul 2015 12:39:29 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644908#comment-14644908
 ]


Joseph K. Bradley commented on SPARK-6885:
------------------------------------------

I'd prefer a variant of #1.  I think it will be nice if LearningNodes store 
something like an ImpurityCalculator (from the mllib.tree implementation), 
which can store label counts for classification and other stats for regression. 
 (That way, we can add probabilistic predictions for regression in a later PR.) 
 So, rather than PredictionStats storing something specific to classification, 
it could store an abstract object usable for either classification or 
regression.

We can keep all of these representations as private API, so I'm OK with 
creating a new version of InformationGainStats if it's helpful.  (I hope we can 
lazily migrate those classes to spark.ml anyways.)

As far as where we store stats, I'd prefer we store them at all LearningNodes 
for simplicity.  We can make it more efficient later on.

I actually did a bit of implementation on this a while ago; please check it out 
and see if anything is useful to you: 
[https://github.com/apache/spark/compare/master...jkbradley:dt-pred-prob]


> Decision trees: predict class probabilities
> -------------------------------------------
>
>                 Key: SPARK-6885
>                 URL: https://issues.apache.org/jira/browse/SPARK-6885
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Assignee: Yanbo Liang
>
> Under spark.ml, have DecisionTreeClassifier (currently being added) extend 
> ProbabilisticClassifier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6885) Decision trees: predict class probabilities

Reply via email to