[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18831 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131520948 --- Diff: R/pkg/tests/fulltests/test_mllib_regression.R --- @@ -173,6 +173,14 @@ test_that("spark.glm summary", { expect_equal(stats$df.residual, rStats$df.residual) expect_equal(stats$aic, rStats$aic) + # Test spark.glm works with offset + training <- suppressWarnings(createDataFrame(iris)) + stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, + family = poisson(), offsetCol = "Petal_Length")) + rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species, +data = iris, family = poisson(), offset = iris$Petal.Length))) --- End diff -- I vote to keep the name as it is, because it's the column name of ```offset``` rather than the ```offset``` itself. We would like to keep SparkR MLlib wrappers' argument name consistent with R only when it's applicable. I'm ok to create a new JIRA to discuss it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131516514 --- Diff: R/pkg/tests/fulltests/test_mllib_regression.R --- @@ -173,6 +173,14 @@ test_that("spark.glm summary", { expect_equal(stats$df.residual, rStats$df.residual) expect_equal(stats$aic, rStats$aic) + # Test spark.glm works with offset + training <- suppressWarnings(createDataFrame(iris)) + stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, + family = poisson(), offsetCol = "Petal_Length")) + rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species, +data = iris, family = poisson(), offset = iris$Petal.Length))) --- End diff -- probably across every in ml. let's discuss this in a new JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131386220 --- Diff: R/pkg/tests/fulltests/test_mllib_regression.R --- @@ -173,6 +173,14 @@ test_that("spark.glm summary", { expect_equal(stats$df.residual, rStats$df.residual) expect_equal(stats$aic, rStats$aic) + # Test spark.glm works with offset + training <- suppressWarnings(createDataFrame(iris)) + stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, + family = poisson(), offsetCol = "Petal_Length")) + rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species, +data = iris, family = poisson(), offset = iris$Petal.Length))) --- End diff -- Then do you want to make the change for weight as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131313635 --- Diff: R/pkg/tests/fulltests/test_mllib_regression.R --- @@ -173,6 +173,14 @@ test_that("spark.glm summary", { expect_equal(stats$df.residual, rStats$df.residual) expect_equal(stats$aic, rStats$aic) + # Test spark.glm works with offset + training <- suppressWarnings(createDataFrame(iris)) + stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + Species, + family = poisson(), offsetCol = "Petal_Length")) + rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + Species, +data = iris, family = poisson(), offset = iris$Petal.Length))) --- End diff -- that's interesting - perhaps we should take `col` in addition to `col name` too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131209057 --- Diff: R/pkg/R/mllib_regression.R --- @@ -125,7 +127,7 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj")) #' @seealso \link{glm}, \link{read.ml} setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL, - regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power, + offsetCol = NULL, regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power, --- End diff -- I'd avoid adding a param in the middle - it breaks code passing param by order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18831#discussion_r131211844 --- Diff: R/pkg/R/mllib_regression.R --- @@ -159,10 +161,16 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), weightCol <- as.character(weightCol) } +if (!is.null(offsetCol) && offsetCol == "") { + offsetCol <- NULL +} else if (!is.null(offsetCol)) { + offsetCol <- as.character(offsetCol) +} --- End diff -- perhaps ``` if (!is.null(offsetCol)) { offsetCol <- as.character(offsetCol) if (nchar(offsetCol) == 0) { offsetCol <- NULL } } ``` not sure if you want to cover other cases when offsetCol cannot be coerced - eg. NA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/18831 [SPARK-21622][ML][SparkR] Support offset in SparkR GLM ## What changes were proposed in this pull request? Support offset in SparkR GLM #16699 You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark sparkROffset Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18831.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18831 commit 6ec068e5f48d393d539f4600bca3cbd1ea7d65a3 Author: actuaryzhang Date: 2017-08-03T06:37:41Z add offset to SparkR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org