[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983272#comment-14983272 ] Xiangrui Meng edited comment on SPARK-9836 at 10/30/15 8:42 PM: Yes, this JIRA is only for the normal equation solver and linear regression. We don't need to add all statistics in a single PR. Let's add statistics that can be easily derived from `diag(A^T W A)` and the residuals. was (Author: mengxr): Yes, this JIRA is only for the normal equation solver and linear regression. We don't need to add all statistics in a single PR. Let's add statistics that can be easily derived from `diag(A^T W A)`. > Provide R-like summary statistics for ordinary least squares via normal > equation solver > --- > > Key: SPARK-9836 > URL: https://issues.apache.org/jira/browse/SPARK-9836 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > In R, model fitting comes with summary statistics. We can provide most of > those via normal equation solver (SPARK-9834). If some statistics requires > additional passes to the dataset, we can expose an option to let users select > desired statistics before model fitting. > {code} > > summary(model) > Call: > glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) > Deviance Residuals: > Min1QMedian3Q Max > -1.30711 -0.25713 -0.05325 0.19542 1.41253 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.2514 0.3698 6.089 9.57e-09 *** > Sepal.Width 0.8036 0.1063 7.557 4.19e-12 *** > Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 *** > Speciesvirginica1.9468 0.1000 19.465 < 2e-16 *** > --- > Signif. codes: > 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > (Dispersion parameter for gaussian family taken to be 0.1918059) > Null deviance: 102.168 on 149 degrees of freedom > Residual deviance: 28.004 on 146 degrees of freedom > AIC: 183.94 > Number of Fisher Scoring iterations: 2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982905#comment-14982905 ] Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:18 PM: -- [~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task. As to the remaining part {quote} Null deviance: 102.168 on 149 degrees of freedom Residual deviance: 28.004 on 146 degrees of freedom AIC: 183.94 Number of Fisher Scoring iterations: 2 {quote} Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :) was (Author: yanboliang): [~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task. As to the following part {quote} Null deviance: 102.168 on 149 degrees of freedom Residual deviance: 28.004 on 146 degrees of freedom AIC: 183.94 Number of Fisher Scoring iterations: 2 {quote} Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :) > Provide R-like summary statistics for ordinary least squares via normal > equation solver > --- > > Key: SPARK-9836 > URL: https://issues.apache.org/jira/browse/SPARK-9836 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > In R, model fitting comes with summary statistics. We can provide most of > those via normal equation solver (SPARK-9834). If some statistics requires > additional passes to the dataset, we can expose an option to let users select > desired statistics before model fitting. > {code} > > summary(model) > Call: > glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) > Deviance Residuals: > Min1QMedian3Q Max > -1.30711 -0.25713 -0.05325 0.19542 1.41253 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.2514 0.3698 6.089 9.57e-09 *** > Sepal.Width 0.8036 0.1063 7.557 4.19e-12 *** > Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 *** > Speciesvirginica1.9468 0.1000 19.465 < 2e-16 *** > --- > Signif. codes: > 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > (Dispersion parameter for gaussian family taken to be 0.1918059) > Null deviance: 102.168 on 149 degrees of freedom > Residual deviance: 28.004 on 146 degrees of freedom > AIC: 183.94 > Number of Fisher Scoring iterations: 2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982905#comment-14982905 ] Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:19 PM: -- [~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task. As to the remaining part {quote} Null deviance: 102.168 on 149 degrees of freedom Residual deviance: 28.004 on 146 degrees of freedom AIC: 183.94 Number of Fisher Scoring iterations: 2 {quote} Some of the statistics variables depends upon IRLS(SPARK-9835). I found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :) was (Author: yanboliang): [~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task. As to the remaining part {quote} Null deviance: 102.168 on 149 degrees of freedom Residual deviance: 28.004 on 146 degrees of freedom AIC: 183.94 Number of Fisher Scoring iterations: 2 {quote} Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :) > Provide R-like summary statistics for ordinary least squares via normal > equation solver > --- > > Key: SPARK-9836 > URL: https://issues.apache.org/jira/browse/SPARK-9836 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > In R, model fitting comes with summary statistics. We can provide most of > those via normal equation solver (SPARK-9834). If some statistics requires > additional passes to the dataset, we can expose an option to let users select > desired statistics before model fitting. > {code} > > summary(model) > Call: > glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) > Deviance Residuals: > Min1QMedian3Q Max > -1.30711 -0.25713 -0.05325 0.19542 1.41253 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.2514 0.3698 6.089 9.57e-09 *** > Sepal.Width 0.8036 0.1063 7.557 4.19e-12 *** > Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 *** > Speciesvirginica1.9468 0.1000 19.465 < 2e-16 *** > --- > Signif. codes: > 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > (Dispersion parameter for gaussian family taken to be 0.1918059) > Null deviance: 102.168 on 149 degrees of freedom > Residual deviance: 28.004 on 146 degrees of freedom > AIC: 183.94 > Number of Fisher Scoring iterations: 2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982905#comment-14982905 ] Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:27 PM: -- [~mengxr] After survey I found that "Coefficients: Estimate Std. Error t value Pr(>|t|) " can get from OLS/WLS(by matrix inverse/diagonalization), "Deviance Residuals" is a general statistic variable, I will add these statistics in this task. As to the remaining part {quote} Null deviance: 102.168 on 149 degrees of freedom Residual deviance: 28.004 on 146 degrees of freedom AIC: 183.94 Number of Fisher Scoring iterations: 2 {quote} Some of the statistics variables depends upon IRLS(SPARK-9835). I found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :) was (Author: yanboliang): [~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task. As to the remaining part {quote} Null deviance: 102.168 on 149 degrees of freedom Residual deviance: 28.004 on 146 degrees of freedom AIC: 183.94 Number of Fisher Scoring iterations: 2 {quote} Some of the statistics variables depends upon IRLS(SPARK-9835). I found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :) > Provide R-like summary statistics for ordinary least squares via normal > equation solver > --- > > Key: SPARK-9836 > URL: https://issues.apache.org/jira/browse/SPARK-9836 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > In R, model fitting comes with summary statistics. We can provide most of > those via normal equation solver (SPARK-9834). If some statistics requires > additional passes to the dataset, we can expose an option to let users select > desired statistics before model fitting. > {code} > > summary(model) > Call: > glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) > Deviance Residuals: > Min1QMedian3Q Max > -1.30711 -0.25713 -0.05325 0.19542 1.41253 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.2514 0.3698 6.089 9.57e-09 *** > Sepal.Width 0.8036 0.1063 7.557 4.19e-12 *** > Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 *** > Speciesvirginica1.9468 0.1000 19.465 < 2e-16 *** > --- > Signif. codes: > 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > (Dispersion parameter for gaussian family taken to be 0.1918059) > Null deviance: 102.168 on 149 degrees of freedom > Residual deviance: 28.004 on 146 degrees of freedom > AIC: 183.94 > Number of Fisher Scoring iterations: 2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904471#comment-14904471 ] Mohamed Baddar edited comment on SPARK-9836 at 9/23/15 8:39 PM: Thanks a lot [~mengxr] , i will try one of the starter tasks , but seems they are all taken , if so , what should i do next ? was (Author: mbaddar): Thanks a lot , i will try one of the starter tasks , but seems they are all taken , if so , what should i do next ? > Provide R-like summary statistics for ordinary least squares via normal > equation solver > --- > > Key: SPARK-9836 > URL: https://issues.apache.org/jira/browse/SPARK-9836 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Xiangrui Meng > > In R, model fitting comes with summary statistics. We can provide most of > those via normal equation solver (SPARK-9834). If some statistics requires > additional passes to the dataset, we can expose an option to let users select > desired statistics before model fitting. > {code} > > summary(model) > Call: > glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) > Deviance Residuals: > Min1QMedian3Q Max > -1.30711 -0.25713 -0.05325 0.19542 1.41253 > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.2514 0.3698 6.089 9.57e-09 *** > Sepal.Width 0.8036 0.1063 7.557 4.19e-12 *** > Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 *** > Speciesvirginica1.9468 0.1000 19.465 < 2e-16 *** > --- > Signif. codes: > 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > (Dispersion parameter for gaussian family taken to be 0.1918059) > Null deviance: 102.168 on 149 degrees of freedom > Residual deviance: 28.004 on 146 degrees of freedom > AIC: 183.94 > Number of Fisher Scoring iterations: 2 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org