[ 
https://issues.apache.org/jira/browse/SPARK-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395274#comment-14395274
 ] 

Apache Spark commented on SPARK-6705:
-------------------------------------

User 'oefirouz' has created a pull request for this issue:
https://github.com/apache/spark/pull/5301

> MLLIB ML Pipeline's Logistic Regression has no intercept term
> -------------------------------------------------------------
>
>                 Key: SPARK-6705
>                 URL: https://issues.apache.org/jira/browse/SPARK-6705
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Omede Firouz
>
> Currently, the ML Pipeline's LogisticRegression.scala file does not allow 
> setting whether or not to fit an intercept term. Therefore, the pipeline 
> defers to LogisticRegressionWithLBFGS which does not use an intercept term. 
> This makes sense from a performance point of view because adding an intercept 
> term requires memory allocation.
> However, this is undesirable statistically, since the statistical default is 
> usually to include an intercept term, and one needs to have a very strong
> reason for not having an intercept term.
> Explicitly modeling the intercept by adding a column of all 1s does not
> work because LogisticRegressionWithLBFGS forces column normalization, and a 
> column of all 1s has 0 variance and so dividing by 0 kills it.
> We should open up the API for the ML Pipeline to explicitly allow controlling 
> whether or not to fit an intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to