[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18128 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r122134992 --- Diff: R/pkg/R/mllib_classification.R --- @@ -202,6 +202,20 @@ function(object, path, overwrite = FALSE) { #' @param aggregationDepth The depth for treeAggregate (greater than or equal to 2). If the dimensions of features #' or the number of partitions are large, this param could be adjusted to a larger size. #' This is an expert parameter. Default value should be good for most cases. +#' @param lowerBoundsOnCoefficients The lower bounds on coefficients if fitting under bound constrained optimization. +#' The bound matrix must be compatible with the shape (1, number of features) for binomial +#' regression, or (number of classes, number of features) for multinomial regression. +#' It is a R matrix. +#' @param upperBoundsOnCoefficients The upper bounds on coefficients if fitting under bound constrained optimization. +#' The bound matrix must be compatible with the shape (1, number of features) for binomial +#' regression, or (number of classes, number of features) for multinomial regression. +#' It is a R matrix. +#' @param lowerBoundsOnIntercepts The lower bounds on intercepts if fitting under bound constrained optimization. +#'The bounds vector size must be equal with 1 for binomial regression, or the number --- End diff -- `size must be equal with 1` - should this be `size must be equal to 1`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r122135628 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +row <- 0 +col <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + if (class(lowerBoundsOnCoefficients) != "matrix") { +stop("lowerBoundsOnCoefficients must be a matrix.") + } + row <- nrow(lowerBoundsOnCoefficients) + col <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + if (class(upperBoundsOnCoefficients) != "matrix") { +stop("upperBoundsOnCoefficients must be a matrix.") + } + + if (!is.null(lowerBoundsOnCoefficients) & (row != nrow(upperBoundsOnCoefficients) +| col != ncol(upperBoundsOnCoefficients))) { --- End diff -- while logically and semantically correct here, might prefer `&&` and `||` instead of `&` and `|` here for programmability/correctness in case something becomes length > 1 here later --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r122135013 --- Diff: R/pkg/R/mllib_classification.R --- @@ -202,6 +202,20 @@ function(object, path, overwrite = FALSE) { #' @param aggregationDepth The depth for treeAggregate (greater than or equal to 2). If the dimensions of features #' or the number of partitions are large, this param could be adjusted to a larger size. #' This is an expert parameter. Default value should be good for most cases. +#' @param lowerBoundsOnCoefficients The lower bounds on coefficients if fitting under bound constrained optimization. +#' The bound matrix must be compatible with the shape (1, number of features) for binomial +#' regression, or (number of classes, number of features) for multinomial regression. +#' It is a R matrix. +#' @param upperBoundsOnCoefficients The upper bounds on coefficients if fitting under bound constrained optimization. +#' The bound matrix must be compatible with the shape (1, number of features) for binomial +#' regression, or (number of classes, number of features) for multinomial regression. +#' It is a R matrix. +#' @param lowerBoundsOnIntercepts The lower bounds on intercepts if fitting under bound constrained optimization. +#'The bounds vector size must be equal with 1 for binomial regression, or the number +#'of classes for multinomial regression. +#' @param upperBoundsOnIntercepts The upper bounds on intercepts if fitting under bound constrained optimization. +#'The bound vector size must be equal with 1 for binomial regression, or the number --- End diff -- ditto here `size must be equal with 1` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119995950 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +row <- 0 +col <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + if (class(lowerBoundsOnCoefficients) != "matrix") { +stop("lowerBoundsOnCoefficients must be a matrix.") + } + row <- nrow(lowerBoundsOnCoefficients) + col <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + if (class(upperBoundsOnCoefficients) != "matrix") { +stop("upperBoundsOnCoefficients must be a matrix.") + } + + if (!is.null(lowerBoundsOnCoefficients) & (row != nrow(upperBoundsOnCoefficients) +| col != ncol(upperBoundsOnCoefficients))) { +stop(paste("dimension of upperBoundsOnCoefficients ", + "is not the same as lowerBoundsOnCoefficients", sep = "")) + } + + if (is.null(lowerBoundsOnCoefficients)) { +row <- nrow(upperBoundsOnCoefficients) +col <- ncol(upperBoundsOnCoefficients) + } --- End diff -- ok thanks, L290-291 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119978881 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +row <- 0 +col <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + if (class(lowerBoundsOnCoefficients) != "matrix") { +stop("lowerBoundsOnCoefficients must be a matrix.") + } + row <- nrow(lowerBoundsOnCoefficients) + col <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + if (class(upperBoundsOnCoefficients) != "matrix") { +stop("upperBoundsOnCoefficients must be a matrix.") + } + + if (!is.null(lowerBoundsOnCoefficients) & (row != nrow(upperBoundsOnCoefficients) +| col != ncol(upperBoundsOnCoefficients))) { +stop(paste("dimension of upperBoundsOnCoefficients ", + "is not the same as lowerBoundsOnCoefficients", sep = "")) + } + + if (is.null(lowerBoundsOnCoefficients)) { +row <- nrow(upperBoundsOnCoefficients) +col <- ncol(upperBoundsOnCoefficients) + } --- End diff -- This is the case where we only set the upperbound. We can set both or either one of them. For the case that both are set. We enforce upperbound and lowerbound are the same dimension, as checked above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119978607 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +row <- 0 +col <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + if (class(lowerBoundsOnCoefficients) != "matrix") { +stop("lowerBoundsOnCoefficients must be a matrix.") + } + row <- nrow(lowerBoundsOnCoefficients) + col <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + if (class(upperBoundsOnCoefficients) != "matrix") { +stop("upperBoundsOnCoefficients must be a matrix.") + } + + if (!is.null(lowerBoundsOnCoefficients) & (row != nrow(upperBoundsOnCoefficients) +| col != ncol(upperBoundsOnCoefficients))) { +stop(paste("dimension of upperBoundsOnCoefficients ", + "is not the same as lowerBoundsOnCoefficients", sep = "")) + } + + if (is.null(lowerBoundsOnCoefficients)) { +row <- nrow(upperBoundsOnCoefficients) +col <- ncol(upperBoundsOnCoefficients) + } + + upperBoundsOnCoefficients <- as.array(as.vector(upperBoundsOnCoefficients)) +} + jobj <- callJStatic("org.apache.spark.ml.r.LogisticRegressionWrapper", "fit", data@sdf, formula, as.numeric(regParam), as.numeric(elasticNetParam), as.integer(maxIter), as.numeric(tol), as.character(family), as.logical(standardization), as.array(thresholds), -weightCol, as.integer(aggregationDepth)) +weightCol, as.integer(aggregationDepth), +as.integer(row), as.integer(col), --- End diff -- nit: no need to `as.integer(row)` and `as.integer(col)` since they are set internally and not a parameter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119978589 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +row <- 0 +col <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + if (class(lowerBoundsOnCoefficients) != "matrix") { +stop("lowerBoundsOnCoefficients must be a matrix.") + } + row <- nrow(lowerBoundsOnCoefficients) + col <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + if (class(upperBoundsOnCoefficients) != "matrix") { +stop("upperBoundsOnCoefficients must be a matrix.") + } + + if (!is.null(lowerBoundsOnCoefficients) & (row != nrow(upperBoundsOnCoefficients) +| col != ncol(upperBoundsOnCoefficients))) { +stop(paste("dimension of upperBoundsOnCoefficients ", --- End diff -- paste would insert space - use paste0 instead or just remove the space in the string to let it to add it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119978644 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +row <- 0 +col <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + if (class(lowerBoundsOnCoefficients) != "matrix") { +stop("lowerBoundsOnCoefficients must be a matrix.") + } + row <- nrow(lowerBoundsOnCoefficients) + col <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + if (class(upperBoundsOnCoefficients) != "matrix") { +stop("upperBoundsOnCoefficients must be a matrix.") + } + + if (!is.null(lowerBoundsOnCoefficients) & (row != nrow(upperBoundsOnCoefficients) +| col != ncol(upperBoundsOnCoefficients))) { +stop(paste("dimension of upperBoundsOnCoefficients ", + "is not the same as lowerBoundsOnCoefficients", sep = "")) + } + + if (is.null(lowerBoundsOnCoefficients)) { +row <- nrow(upperBoundsOnCoefficients) +col <- ncol(upperBoundsOnCoefficients) + } --- End diff -- given how this is used later in scala code, should there be a check that nrow(upper) == nrow(lower) and ditto for ncol(upper) == ncol(lower)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119911447 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,57 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +lrow <- 0 +lcol <- 0 +urow <- 0 +ucol <- 0 --- End diff -- Oh, I think I can do the check because I have a `NULL` check before enforcing the rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119911006 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,57 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +lrow <- 0 +lcol <- 0 +urow <- 0 +ucol <- 0 --- End diff -- Question: Based on my understanding, `lowerBoundsOnCoefficients ` and `upperBoundsOnCoefficients ` are not required to set at the same time. They can be set at the same time. For the first case, we can't enforce the dimension of the two matrices because one could be `NULL`. For the second case, we can check it. So, we can't enforce the rule in general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119788377 --- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R --- @@ -225,6 +225,32 @@ test_that("spark.logit", { model2 <- spark.logit(df2, label ~ feature, weightCol = "weight") prediction2 <- collect(select(predict(model2, df2), "prediction")) expect_equal(sort(prediction2$prediction), c("0.0", "0.0", "0.0", "0.0", "0.0")) + + # Test binomial logistic regression againt two classes with upperBoundsOnCoefficients + # and upperBoundsOnIntercepts + u <- matrix(c(1.0, 0.0, 1.0, 0.0), nrow = 1, ncol = 4) + model <- spark.logit(training, Species ~ ., upperBoundsOnCoefficients = u, + upperBoundsOnIntercepts = 1.0) + summary <- summary(model) + coefsR <- c(-11.13331, 1.0, 0.0, 1.0, 0.0) + coefs <- summary$coefficients[, "Estimate"] + expect_true(all(abs(coefsR - coefs) < 0.1)) + # Test upperBoundsOnCoefficients should be matrix + expect_error(spark.logit(training, Species ~ ., upperBoundsOnCoefficients = as.array(c(1, 2)), + upperBoundsOnIntercepts = 1.0)) + + # Test binomial logistic regression againt two classes with lowerBoundsOnCoefficients + # and lowerBoundsOnIntercepts + l <- matrix(c(0.0, -1.0, 0.0, -1.0), nrow = 1, ncol = 4) + model <- spark.logit(training, Species ~ ., lowerBoundsOnCoefficients = l, + lowerBoundsOnIntercepts = 0.0) + summary <- summary(model) + coefsR <- c(0, 0, -1, 0, 1.902192) + coefs <- summary$coefficients[, "Estimate"] + expect_true(all(abs(coefsR - coefs) < 0.1)) + # Test lowerBoundsOnCoefficients should be matrix + expect_error(spark.logit(training, Species ~ ., lowerBoundsOnCoefficients = as.array(c(1, 2)), + lowerBoundsOnIntercepts = 0.0)) --- End diff -- Could you add test for ```multinomial``` logistic regression? I think only test one side bound is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119787497 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,57 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +lrow <- 0 +lcol <- 0 +urow <- 0 +ucol <- 0 --- End diff -- Here we can reduce from four parameters to two: ```nrow``` and ```ncol```, since the matrix format of ```lowerBoundsOnCoefficients``` and ```upperBoundsOnCoefficients``` should be consistent. If the input doesn't confirm with this rule, we need to throw exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r119788169 --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LogisticRegressionWrapper.scala --- @@ -97,7 +97,15 @@ private[r] object LogisticRegressionWrapper standardization: Boolean, thresholds: Array[Double], weightCol: String, - aggregationDepth: Int + aggregationDepth: Int, + lrow: Int, + lcol: Int, + urow: Int, + ucol: Int, --- End diff -- See my comment above, we can merge the four parameters into two: ```numRowsOfBoundsOnCoefficients``` and ```numColsOfBoundsOnCoefficients```, please follow Scala naming convention. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18128#discussion_r118827649 --- Diff: R/pkg/R/mllib_classification.R --- @@ -239,21 +253,51 @@ function(object, path, overwrite = FALSE) { setMethod("spark.logit", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, regParam = 0.0, elasticNetParam = 0.0, maxIter = 100, tol = 1E-6, family = "auto", standardization = TRUE, - thresholds = 0.5, weightCol = NULL, aggregationDepth = 2) { + thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, + lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, + lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL) { formula <- paste(deparse(formula), collapse = "") +lrow <- 0 +lcol <- 0 +urow <- 0 +ucol <- 0 if (!is.null(weightCol) && weightCol == "") { weightCol <- NULL } else if (!is.null(weightCol)) { weightCol <- as.character(weightCol) } +if (!is.null(lowerBoundsOnIntercepts)) { +lowerBoundsOnIntercepts <- as.array(lowerBoundsOnIntercepts) +} + +if (!is.null(upperBoundsOnIntercepts)) { +upperBoundsOnIntercepts <- as.array(upperBoundsOnIntercepts) +} + +if (!is.null(lowerBoundsOnCoefficients)) { + lrow <- nrow(lowerBoundsOnCoefficients) + lcol <- ncol(lowerBoundsOnCoefficients) + lowerBoundsOnCoefficients <- as.array(as.vector(lowerBoundsOnCoefficients)) +} + +if (!is.null(upperBoundsOnCoefficients)) { + urow <- nrow(upperBoundsOnCoefficients) + ucol <- ncol(upperBoundsOnCoefficients) --- End diff -- could you add some check for upperBoundsOnCoefficients or lowerBoundsOnCoefficients for example, if upperBoundsOnCoefficients is a vector instead of a matrix, ncol(as.array(upperBoundsOnCoefficients)) will be NA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/18128 [SPARK-20906][SparkR]:Constrained Logistic Regression for SparkR ## What changes were proposed in this pull request? PR https://github.com/apache/spark/pull/17715 Added Constrained Logistic Regression for ML. We should add it to SparkR. ## How was this patch tested? Add new unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18128.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18128 commit 1fc68f69ecce46c8d4c2bbd2d9aafdd042c27108 Author: wangmiao1981Date: 2017-05-27T06:27:04Z add constraint logit commit 7627ac9c093ba72afd586c3ea1e482238d29c3c3 Author: wangmiao1981 Date: 2017-05-27T07:29:25Z add unit test and doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org