[jira] [Commented] (SPARK-22871) Add GBT+LR Algorithm in MLlib
[ https://issues.apache.org/jira/browse/SPARK-22871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307210#comment-16307210 ] Nick Pentreath commented on SPARK-22871: Tree-based feature transformation is covered in SPARK-13677. I think this duplicates that ticket. I also think it is best to leave the functionality separate rather than create a new estimator in Spark. i.e. we could add the leaf-based feature transformation to the tree models, and leave it up to the user to combine that with LR etc. I think this separation of concerns and modularity is better. Finally, as [~srowen] mentions in SPARK-22867, I think this particular model is best kept as a separate Spark package. > Add GBT+LR Algorithm in MLlib > - > > Key: SPARK-22871 > URL: https://issues.apache.org/jira/browse/SPARK-22871 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Fangzhou Yang > > GBTLRClassifier is a hybrid model of Gradient Boosting Trees and Logistic > Regression. > It is quite practical and popular in many data mining competitions. In this > hybrid model, input features are transformed by means of boosted decision > trees. The output of each individual tree is treated as a categorical input > feature to a sparse linear classifer. Boosted decision trees prove to be very > powerful feature transforms. > Model details about GBTLR can be found in the following paper: > https://dl.acm.org/citation.cfm?id=2648589;>Practical Lessons from > Predicting Clicks on Ads at Facebook -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22871) Add GBT+LR Algorithm in MLlib
[ https://issues.apache.org/jira/browse/SPARK-22871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300972#comment-16300972 ] Fangzhou Yang commented on SPARK-22871: --- GBTLRClassifier on Spark is designed and implemented by combining GradientBoostedTrees and Logistic Regressor in Spark MLlib. Features are firstly trained and transformed into sparse vectors via GradientBoostedTrees, and then the generated sparse features will be trained and predicted in Logistic Regression model. More details about Spark GBTLR can be found in my github repository: https://github.com/titicaca/spark-gbtlr > Add GBT+LR Algorithm in MLlib > - > > Key: SPARK-22871 > URL: https://issues.apache.org/jira/browse/SPARK-22871 > Project: Spark > Issue Type: New Feature > Components: MLlib >Affects Versions: 2.2.1 >Reporter: Fangzhou Yang > > GBTLRClassifier is a hybrid model of Gradient Boosting Trees and Logistic > Regression. > It is quite practical and popular in many data mining competitions. In this > hybrid model, input features are transformed by means of boosted decision > trees. The output of each individual tree is treated as a categorical input > feature to a sparse linear classifer. Boosted decision trees prove to be very > powerful feature transforms. > Model details about GBTLR can be found in the following paper: > https://dl.acm.org/citation.cfm?id=2648589;>Practical Lessons from > Predicting Clicks on Ads at Facebook -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org