[jira] [Commented] (SPARK-22871) Add GBT+LR Algorithm in MLlib

2017-12-31 Thread Nick Pentreath (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307210#comment-16307210
 ] 

Nick Pentreath commented on SPARK-22871:


Tree-based feature transformation is covered in SPARK-13677. I think this 
duplicates that ticket. I also think it is best to leave the functionality 
separate rather than create a new estimator in Spark. i.e. we could add the 
leaf-based feature transformation to the tree models, and leave it up to the 
user to combine that with LR etc. I think this separation of concerns and 
modularity is better.

Finally, as [~srowen] mentions in SPARK-22867, I think this particular model is 
best kept as a separate Spark package.

> Add GBT+LR Algorithm in MLlib
> -
>
> Key: SPARK-22871
> URL: https://issues.apache.org/jira/browse/SPARK-22871
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Fangzhou Yang
>
> GBTLRClassifier is a hybrid model of Gradient Boosting Trees and Logistic 
> Regression. 
> It is quite practical and popular in many data mining competitions. In this 
> hybrid model, input features are transformed by means of boosted decision 
> trees. The output of each individual tree is treated as a categorical input 
> feature to a sparse linear classifer. Boosted decision trees prove to be very 
> powerful feature transforms.
> Model details about GBTLR can be found in the following paper:
> https://dl.acm.org/citation.cfm?id=2648589;>Practical Lessons from 
> Predicting Clicks on Ads at Facebook 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22871) Add GBT+LR Algorithm in MLlib

2017-12-21 Thread Fangzhou Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300972#comment-16300972
 ] 

Fangzhou Yang commented on SPARK-22871:
---

GBTLRClassifier on Spark is designed and implemented by combining 
GradientBoostedTrees and Logistic Regressor in Spark MLlib. Features are 
firstly trained and transformed into sparse vectors via GradientBoostedTrees, 
and then the generated sparse features will be trained and predicted in 
Logistic Regression model.

More details about Spark GBTLR can be found in my github repository:
https://github.com/titicaca/spark-gbtlr

> Add GBT+LR Algorithm in MLlib
> -
>
> Key: SPARK-22871
> URL: https://issues.apache.org/jira/browse/SPARK-22871
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Fangzhou Yang
>
> GBTLRClassifier is a hybrid model of Gradient Boosting Trees and Logistic 
> Regression. 
> It is quite practical and popular in many data mining competitions. In this 
> hybrid model, input features are transformed by means of boosted decision 
> trees. The output of each individual tree is treated as a categorical input 
> feature to a sparse linear classifer. Boosted decision trees prove to be very 
> powerful feature transforms.
> Model details about GBTLR can be found in the following paper:
> https://dl.acm.org/citation.cfm?id=2648589;>Practical Lessons from 
> Predicting Clicks on Ads at Facebook 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org