[ https://issues.apache.org/jira/browse/SPARK-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492989#comment-14492989 ]
Max Kaznady commented on SPARK-6113: ------------------------------------ Other places need serious improvement as well, LogisticRegressionWithLBFGS is another example. All LogisticRegression classifiers need a logistic function. I found this ticket, but I’m not sure why it’s closed: https://issues.apache.org/jira/browse/SPARK-3585 I think LogisticRegression and RandomForest should have the same name for the predict_proba function. I would just call it that, since then at least PySpark is consistent with sklearn library. Internally logistic function should be implemented as a single function, not hard-coded in multiple places the way that it is now. That’s another ticket. Aside: I haven’t looked at LogisticRegressionWithSGD, but it fails horribly sometimes: algo either diverges or gets stuck in local minima. > Stabilize DecisionTree and ensembles APIs > ----------------------------------------- > > Key: SPARK-6113 > URL: https://issues.apache.org/jira/browse/SPARK-6113 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark > Affects Versions: 1.4.0 > Reporter: Joseph K. Bradley > Assignee: Joseph K. Bradley > Priority: Critical > > *Issue*: The APIs for DecisionTree and ensembles (RandomForests and > GradientBoostedTrees) have been experimental for a long time. The API has > become very convoluted because trees and ensembles have many, many variants, > some of which we have added incrementally without a long-term design. > *Proposal*: This JIRA is for discussing changes required to finalize the > APIs. After we discuss, I will make a PR to update the APIs and make them > non-Experimental. This will require making many breaking changes; see the > design doc for details. > [Design doc | > https://docs.google.com/document/d/1rJ_DZinyDG3PkYkAKSsQlY0QgCeefn4hUv7GsPkzBP4]: > This outlines current issues and the proposed API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org