[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah Thanks for all the helpful discussions! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16149 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16149 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16149 **[Test build #3495 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3495/consoleFull)** for PR 16149 at commit [`6e6c48b`](https://github.com/apache/spark/commit/6e6c48b79065666e1e896eec76e1ffa8cb751b6e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16149 **[Test build #3495 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3495/consoleFull)** for PR 16149 at commit [`6e6c48b`](https://github.com/apache/spark/commit/6e6c48b79065666e1e896eec76e1ffa8cb751b6e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah One more commit that adds a test case with `weight = 4.7` which will round up to 5 to test the case @sethah described. All tests passed. I'm pretty sure R's rounding is the same as what I'm doing here. Please merge if there is no other issue. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah Would you please review this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah @srowen I updated the documentation. I think we have everything needed for this fix. Please merge and close this PR if there is no other issue. Thanks much for all the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @sethah @srowen I have added a comment to the weigthCol doc for the Binomial case. I also updated to test the case `weight < 0.5`, i.e., `round(weight) = 0`. All tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16149 @srowen We can add a note to the doc for `setWeightCol`. We could also use `logInfo` about weights needing to be integer values for `Binomial` family, but that may not be very effective. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah I have cleaned up the change as suggested. Please review and let me know if there is any question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16149 What you have is fine (though I might name it `ylogy` or something). I don't see other places in the code that compute x ln x or something similar, so it's OK to make this a private function. You might just make a `private def` of a local helper method rather than instantiate a lambda, but the difference is trivial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 @srowen @sethah Thanks for the comments. Yes, the major use case is to be able to handle multiple trials (integer weight, real-valued response). Indeed, a better way to do this is through `offset`, which I have proposed to do in this JIRA [SPARK-18710](https://issues.apache.org/jira/browse/SPARK-18710). Please let me know if this is worth pursuing. I have submitted another two commits. 1. One commit makes minimal modification to the exiting Binomial GLM test so that one response record is now non-integer. This test still failed because the deviance residual calculation seems to work only for `y in (0, 1)` 2. The second commit fixes the issue in calculating the deviance. But I think the code can be improved, especially regarding the function `y_logy`. What's the best way to create a utility function like this? Please advise. With the two commits, all tests now passed, including the ones on AIC and the deviance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16149 **[Test build #3470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3470/consoleFull)** for PR 16149 at commit [`7fdab86`](https://github.com/apache/spark/commit/7fdab860f740de558fa1281255b5e7dc35480d7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16149 **[Test build #3470 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3470/consoleFull)** for PR 16149 at commit [`7fdab86`](https://github.com/apache/spark/commit/7fdab860f740de558fa1281255b5e7dc35480d7d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16149 Jenkins, add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16149: [SPARK-18715][ML]Fix AIC calculations in Binomial GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16149 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org