[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

2015-10-30 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983272#comment-14983272
 ] 

Xiangrui Meng edited comment on SPARK-9836 at 10/30/15 8:42 PM:


Yes, this JIRA is only for the normal equation solver and linear regression. We 
don't need to add all statistics in a single PR. Let's add statistics that can 
be easily derived from `diag(A^T W A)` and the residuals.


was (Author: mengxr):
Yes, this JIRA is only for the normal equation solver and linear regression. We 
don't need to add all statistics in a single PR. Let's add statistics that can 
be easily derived from `diag(A^T W A)`.

> Provide R-like summary statistics for ordinary least squares via normal 
> equation solver
> ---
>
> Key: SPARK-9836
> URL: https://issues.apache.org/jira/browse/SPARK-9836
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of 
> those via normal equation solver (SPARK-9834). If some statistics requires 
> additional passes to the dataset, we can expose an option to let users select 
> desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>  Min1QMedian3Q   Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.2514 0.3698   6.089 9.57e-09 ***
> Sepal.Width 0.8036 0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587 0.1121  13.012  < 2e-16 ***
> Speciesvirginica1.9468 0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

2015-10-30 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982905#comment-14982905
 ] 

Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:18 PM:
--

[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: 
Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add 
these statistics in this task.
As to the remaining part
{quote}
Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found 
you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so 
these statistics will be work of SPARK-9837. Please correct me if have 
misunderstand. :)


was (Author: yanboliang):
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: 
Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add 
these statistics in this task.
As to the following part
{quote}
Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found 
you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so 
these statistics will be work of SPARK-9837. Please correct me if have 
misunderstand. :)

> Provide R-like summary statistics for ordinary least squares via normal 
> equation solver
> ---
>
> Key: SPARK-9836
> URL: https://issues.apache.org/jira/browse/SPARK-9836
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of 
> those via normal equation solver (SPARK-9834). If some statistics requires 
> additional passes to the dataset, we can expose an option to let users select 
> desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>  Min1QMedian3Q   Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.2514 0.3698   6.089 9.57e-09 ***
> Sepal.Width 0.8036 0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587 0.1121  13.012  < 2e-16 ***
> Speciesvirginica1.9468 0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

2015-10-30 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982905#comment-14982905
 ] 

Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:19 PM:
--

[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: 
Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add 
these statistics in this task.
As to the remaining part
{quote}
Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I found you 
have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these 
statistics will be work of SPARK-9837. Please correct me if have misunderstand. 
:)


was (Author: yanboliang):
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: 
Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add 
these statistics in this task.
As to the remaining part
{quote}
Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found 
you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so 
these statistics will be work of SPARK-9837. Please correct me if have 
misunderstand. :)

> Provide R-like summary statistics for ordinary least squares via normal 
> equation solver
> ---
>
> Key: SPARK-9836
> URL: https://issues.apache.org/jira/browse/SPARK-9836
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of 
> those via normal equation solver (SPARK-9834). If some statistics requires 
> additional passes to the dataset, we can expose an option to let users select 
> desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>  Min1QMedian3Q   Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.2514 0.3698   6.089 9.57e-09 ***
> Sepal.Width 0.8036 0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587 0.1121  13.012  < 2e-16 ***
> Speciesvirginica1.9468 0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

2015-10-30 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982905#comment-14982905
 ] 

Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:27 PM:
--

[~mengxr] After survey I found that "Coefficients: Estimate Std. Error t value 
Pr(>|t|)  " can get from OLS/WLS(by matrix inverse/diagonalization), "Deviance 
Residuals" is a general statistic variable, I will add these statistics in this 
task.
As to the remaining part
{quote}
Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I found you 
have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these 
statistics will be work of SPARK-9837. Please correct me if have misunderstand. 
:)


was (Author: yanboliang):
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: 
Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add 
these statistics in this task.
As to the remaining part
{quote}
Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I found you 
have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these 
statistics will be work of SPARK-9837. Please correct me if have misunderstand. 
:)

> Provide R-like summary statistics for ordinary least squares via normal 
> equation solver
> ---
>
> Key: SPARK-9836
> URL: https://issues.apache.org/jira/browse/SPARK-9836
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of 
> those via normal equation solver (SPARK-9834). If some statistics requires 
> additional passes to the dataset, we can expose an option to let users select 
> desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>  Min1QMedian3Q   Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.2514 0.3698   6.089 9.57e-09 ***
> Sepal.Width 0.8036 0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587 0.1121  13.012  < 2e-16 ***
> Speciesvirginica1.9468 0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

2015-09-23 Thread Mohamed Baddar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904471#comment-14904471
 ] 

Mohamed Baddar edited comment on SPARK-9836 at 9/23/15 8:39 PM:


Thanks a lot [~mengxr] , i will try one of the starter tasks , but seems they 
are all taken , if so , what should i do next ?


was (Author: mbaddar):
Thanks a lot , i will try one of the starter tasks , but seems they are all 
taken , if so , what should i do next ?

> Provide R-like summary statistics for ordinary least squares via normal 
> equation solver
> ---
>
> Key: SPARK-9836
> URL: https://issues.apache.org/jira/browse/SPARK-9836
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Xiangrui Meng
>
> In R, model fitting comes with summary statistics. We can provide most of 
> those via normal equation solver (SPARK-9834). If some statistics requires 
> additional passes to the dataset, we can expose an option to let users select 
> desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>  Min1QMedian3Q   Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.2514 0.3698   6.089 9.57e-09 ***
> Sepal.Width 0.8036 0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587 0.1121  13.012  < 2e-16 ***
> Speciesvirginica1.9468 0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org