from:"Max Kaznady \(JIRA\)"

[jira] [Commented] (SPARK-3727) Trees and ensembles: More prediction functionality

2015-08-03 Thread Max Kaznady (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651520#comment-14651520
 ] 

Max Kaznady commented on SPARK-3727:


I am currently away from the office and will respond to your email on 
Wednesday, August 5-th.

For urgent requests, please contact my manager, Steven Yuan.



 Trees and ensembles: More prediction functionality
 --

 Key: SPARK-3727
 URL: https://issues.apache.org/jira/browse/SPARK-3727
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley

 DecisionTree and RandomForest currently predict the most likely label for 
 classification and the mean for regression.  Other info about predictions 
 would be useful.
 For classification: estimated probability of each possible label
 For regression: variance of estimate
 RandomForest could also create aggregate predictions in multiple ways:
 * Predict mean or median value for regression.
 * Compute variance of estimates (across all trees) for both classification 
 and regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6884) Random forest: predict class probabilities

2015-07-29 Thread Max Kaznady (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646013#comment-14646013
 ] 

Max Kaznady commented on SPARK-6884:


Sorry, I've been trying to setup a work environment to push the change.

The problem is the security in my workplace - I can't push any code out which 
I've developed. So I would have to re-develop from scratch at home and push the 
change in.

 Random forest: predict class probabilities
 --

 Key: SPARK-6884
 URL: https://issues.apache.org/jira/browse/SPARK-6884
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Max Kaznady
  Labels: prediction, probability, randomforest, tree
   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, there is no way to extract the class probabilities from the 
 RandomForest classifier. I implemented a probability predictor by counting 
 votes from individual trees and adding up their votes for 1 and then 
 dividing by the total number of votes.
 I opened this ticked to keep track of changes. Will update once I push my 
 code to master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6884) Random forest: predict class probabilities

2015-07-29 Thread Max Kaznady (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646014#comment-14646014
 ] 

Max Kaznady commented on SPARK-6884:


Thanks, this is a better way going forward.

 Random forest: predict class probabilities
 --

 Key: SPARK-6884
 URL: https://issues.apache.org/jira/browse/SPARK-6884
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Max Kaznady
  Labels: prediction, probability, randomforest, tree
   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, there is no way to extract the class probabilities from the 
 RandomForest classifier. I implemented a probability predictor by counting 
 votes from individual trees and adding up their votes for 1 and then 
 dividing by the total number of votes.
 I opened this ticked to keep track of changes. Will update once I push my 
 code to master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality

2015-04-13 Thread Max Kaznady (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492839#comment-14492839
 ] 

Max Kaznady commented on SPARK-3727:


I implemented the same thing but for PySpark. Since there is no existing 
function, should I just call the function predict_proba like in sklearn? 

Also, does it make sense to open a new ticket for this, since it's so specific?

Thanks,
Max

 DecisionTree, RandomForest: More prediction functionality
 -

 Key: SPARK-3727
 URL: https://issues.apache.org/jira/browse/SPARK-3727
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley

 DecisionTree and RandomForest currently predict the most likely label for 
 classification and the mean for regression.  Other info about predictions 
 would be useful.
 For classification: estimated probability of each possible label
 For regression: variance of estimate
 RandomForest could also create aggregate predictions in multiple ways:
 * Predict mean or median value for regression.
 * Compute variance of estimates (across all trees) for both classification 
 and regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6113) Stabilize DecisionTree and ensembles APIs

2015-04-13 Thread Max Kaznady (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492959#comment-14492959
]

Max Kaznady commented on SPARK-6113:

[~josephkb] Is it possible to host the API Design doc on something other than
Google Docs? My (and most other) corporate policies forbid access to Google
Docs, so I cannot download the file.

Stabilize DecisionTree and ensembles APIs
-

Key: SPARK-6113
URL: https://issues.apache.org/jira/browse/SPARK-6113
Project: Spark
Issue Type: Sub-task
Components: MLlib, PySpark
Affects Versions: 1.4.0
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Critical

*Issue*: The APIs for DecisionTree and ensembles (RandomForests and
GradientBoostedTrees) have been experimental for a long time. The API has
become very convoluted because trees and ensembles have many, many variants,
some of which we have added incrementally without a long-term design.
*Proposal*: This JIRA is for discussing changes required to finalize the
APIs. After we discuss, I will make a PR to update the APIs and make them
non-Experimental. This will require making many breaking changes; see the
design doc for details.
[Design doc |
https://docs.google.com/document/d/1rJ_DZinyDG3PkYkAKSsQlY0QgCeefn4hUv7GsPkzBP4]:
This outlines current issues and the proposed API.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6113) Stabilize DecisionTree and ensembles APIs

2015-04-13 Thread Max Kaznady (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492989#comment-14492989
]

Max Kaznady commented on SPARK-6113:

Other places need serious improvement as well, LogisticRegressionWithLBFGS is
another example.

All LogisticRegression classifiers need a logistic function. I found this
ticket, but I’m not sure why it’s closed:
https://issues.apache.org/jira/browse/SPARK-3585

I think LogisticRegression and RandomForest should have the same name for the
predict_proba function. I would just call it that, since then at least PySpark
is consistent with sklearn library.

Internally logistic function should be implemented as a single function, not
hard-coded in multiple places the way that it is now. That’s another ticket.

Aside: I haven’t looked at LogisticRegressionWithSGD, but it fails horribly
sometimes: algo either diverges or gets stuck in local minima.

Stabilize DecisionTree and ensembles APIs
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality

2015-04-13 Thread Max Kaznady (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492906#comment-14492906
]

Max Kaznady commented on SPARK-3727:

Yes, probabilities have to be added to other models too, like
LogisticRegression. Right now they are hardcoded in two places but not
outputted in PySpark.

I think is makes sense to split into PySpark, then classification, then
probabilities, and then group different types of algorithms, all of which
output probabilities: Logistic Regression, Random Forest, etc.

Can also add probabilities for trees by counting the number of leaf 1's and 0's.

What do you think?

DecisionTree, RandomForest: More prediction functionality
-

Key: SPARK-3727
URL: https://issues.apache.org/jira/browse/SPARK-3727
Project: Spark
Issue Type: Improvement
Components: MLlib
Reporter: Joseph K. Bradley

DecisionTree and RandomForest currently predict the most likely label for
classification and the mean for regression. Other info about predictions
would be useful.
For classification: estimated probability of each possible label
For regression: variance of estimate
RandomForest could also create aggregate predictions in multiple ways:
* Predict mean or median value for regression.
* Compute variance of estimates (across all trees) for both classification
and regression.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6884) random forest predict probabilities functionality (like in sklearn)

2015-04-13 Thread Max Kaznady (JIRA)

Max Kaznady created SPARK-6884:
--

 Summary: random forest predict probabilities functionality (like 
in sklearn)
 Key: SPARK-6884
 URL: https://issues.apache.org/jira/browse/SPARK-6884
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.4.0
 Environment: cross-platform
Reporter: Max Kaznady


Currently, there is no way to extract the class probabilities from the 
RandomForest classifier. I implemented a probability predictor by counting 
votes from individual trees and adding up their votes for 1 and then dividing 
by the total number of votes.

I opened this ticked to keep track of changes. Will update once I push my code 
to master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6884) random forest predict probabilities functionality (like in sklearn)

2015-04-13 Thread Max Kaznady (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492868#comment-14492868
 ] 

Max Kaznady commented on SPARK-6884:


Implemented a prototype, testing mapReduce code.

 random forest predict probabilities functionality (like in sklearn)
 ---

 Key: SPARK-6884
 URL: https://issues.apache.org/jira/browse/SPARK-6884
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.4.0
 Environment: cross-platform
Reporter: Max Kaznady
  Labels: prediction, probability, randomforest, tree
   Original Estimate: 72h
  Remaining Estimate: 72h

 Currently, there is no way to extract the class probabilities from the 
 RandomForest classifier. I implemented a probability predictor by counting 
 votes from individual trees and adding up their votes for 1 and then 
 dividing by the total number of votes.
 I opened this ticked to keep track of changes. Will update once I push my 
 code to master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality

2015-04-13 Thread Max Kaznady (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492871#comment-14492871
 ] 

Max Kaznady commented on SPARK-3727:


I thought it would be more fitting to separate this: 
https://issues.apache.org/jira/browse/SPARK-6884

 DecisionTree, RandomForest: More prediction functionality
 -

 Key: SPARK-3727
 URL: https://issues.apache.org/jira/browse/SPARK-3727
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley

 DecisionTree and RandomForest currently predict the most likely label for 
 classification and the mean for regression.  Other info about predictions 
 would be useful.
 For classification: estimated probability of each possible label
 For regression: variance of estimate
 RandomForest could also create aggregate predictions in multiple ways:
 * Predict mean or median value for regression.
 * Compute variance of estimates (across all trees) for both classification 
 and regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3727) Trees and ensembles: More prediction functionality

[jira] [Commented] (SPARK-6884) Random forest: predict class probabilities

[jira] [Commented] (SPARK-6884) Random forest: predict class probabilities

[jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality

[jira] [Commented] (SPARK-6113) Stabilize DecisionTree and ensembles APIs

[jira] [Commented] (SPARK-6113) Stabilize DecisionTree and ensembles APIs

[jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality

[jira] [Created] (SPARK-6884) random forest predict probabilities functionality (like in sklearn)

[jira] [Commented] (SPARK-6884) random forest predict probabilities functionality (like in sklearn)

[jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality

10 matches

Site Navigation

Mail list logo

Footer information