[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-04-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234634#comment-15234634
 ] 

Apache Spark commented on SPARK-12566:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/12294

> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Critical
>
> This JIRA is for extending the support of MLlib's Generalized Linear Models 
> (GLMs) to more model families and link functions in SparkR. After 
> SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR 
> with support of popular families and link functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-04-06 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229368#comment-15229368
 ] 

Joseph K. Bradley commented on SPARK-12566:
---

Here's my preferred design.  I prefer to abstract the implementation (solver) 
from the API (model) as much as possible.
* R glm calls Scala GLM, using solver = auto by default
* Scala GLM has solver = auto by default.  Auto should mean "best effort"
** With few features (< 4K or so),
*** For family = gaussian and link = identity, use normal equations.
*** For others, use IRLS.
** With many features, use LBFGS if possible (for family, link).  Otherwise, 
throw an exception.
* Scala LinearRegression, LogisticRegression call GLM.  I.e., they uses normal 
equations, IRLS when possible.

What do yall think?

> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Critical
>
> This JIRA is for extending the support of MLlib's Generalized Linear Models 
> (GLMs) to more model families and link functions in SparkR. After 
> SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR 
> with support of popular families and link functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-03-09 Thread Timothy Hunter (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188057#comment-15188057
 ] 

Timothy Hunter commented on SPARK-12566:


[~yuhaoyan] I took a look at the current code, and it looks like the 
implementation of GLM in SparkRWrappers, and it looks like we only check the 
solver in the case of the gaussian family.

[~mengxr] if users use the 'auto' solver, it means we can swap the 
implementation underneath, right?

If this is the case, here is what I suggest, in pseudo-scala-code:
{code}
(family, solver) match {
  (gaussian, auto) => IRLS // This is a behavioral change
  (gaussian, normal | l-bfgs) => LinearRegression
  (binomial, auto) => IRLS // This is a behavioral change
  (binomial, binomial) => LogisticRegression // This is a new option to 
preserve logisticregression if there is a need for that
  (_, _) => IRLS
}
{code}

> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Critical
>
> This JIRA is for extending the support of MLlib's Generalized Linear Models 
> (GLMs) to more model families and link functions in SparkR. After 
> SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR 
> with support of popular families and link functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-03-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182401#comment-15182401
 ] 

Apache Spark commented on SPARK-12566:
--

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/11549

> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Critical
>
> This JIRA is for extending the support of MLlib's Generalized Linear Models 
> (GLMs) to more model families and link functions in SparkR. After 
> SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR 
> with support of popular families and link functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-03-06 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182384#comment-15182384
 ] 

yuhao yang commented on SPARK-12566:


Since we already have a glm in SparkR which is based on LogisticRegressionModel 
and LinearRegressionModel. There're three ways to extend it as I understand:
1. Change the current glm to use GeneralizedLinearRegression. Create another lm 
interface for sparkR, and use LR as the model. 
2. Keep glm R interface. and replace its implementation with GLM. This means R 
can not invoke LR anymore.
2. Keep glm R interface, and combine the implementation with both LR and GLM 
based on different solver parameter.

I'd prefer to use option 1. And I'm gonna send one PR(WIP) for solution 2, 
which can later be adjusted to 1 or 3.


> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
>Priority: Critical
>
> This JIRA is for extending the support of MLlib's Generalized Linear Models 
> (GLMs) to more model families and link functions in SparkR. After 
> SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR 
> with support of popular families and link functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-03-01 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174022#comment-15174022
 ] 

yuhao yang commented on SPARK-12566:


Yes, I'll start on it. Thanks.

> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> This JIRA is for extending the support of MLlib's Generalized Linear Models 
> (GLMs) to more model families and link functions in SparkR. After 
> SPARK-12811, we should be able to wrap GeneralizedLinearRegression in SparkR 
> with support of popular families and link functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12566) GLM model family, link function support in SparkR:::glm

2016-03-01 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174007#comment-15174007
 ] 

Xiangrui Meng commented on SPARK-12566:
---

[~yuhaoyan] We merged SPARK-12811 and hence this JIRA is unblocked. Are you 
interested in working on it?

> GLM model family, link function support in SparkR:::glm
> ---
>
> Key: SPARK-12566
> URL: https://issues.apache.org/jira/browse/SPARK-12566
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This is an umbrella for extending the support of MLlib's Generalized Linear 
> Models (GLMs) to more model families and link functions in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org