[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18128


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r122134992
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -202,6 +202,20 @@ function(object, path, overwrite = FALSE) {
 #' @param aggregationDepth The depth for treeAggregate (greater than or 
equal to 2). If the dimensions of features
 #' or the number of partitions are large, this 
param could be adjusted to a larger size.
 #' This is an expert parameter. Default value 
should be good for most cases.
+#' @param lowerBoundsOnCoefficients The lower bounds on coefficients if 
fitting under bound constrained optimization.
+#'  The bound matrix must be compatible 
with the shape (1, number of features) for binomial
+#'  regression, or (number of classes, 
number of features) for multinomial regression.
+#'  It is a R matrix.
+#' @param upperBoundsOnCoefficients The upper bounds on coefficients if 
fitting under bound constrained optimization.
+#'  The bound matrix must be compatible 
with the shape (1, number of features) for binomial
+#'  regression, or (number of classes, 
number of features) for multinomial regression.
+#'  It is a R matrix.
+#' @param lowerBoundsOnIntercepts The lower bounds on intercepts if 
fitting under bound constrained optimization.
+#'The bounds vector size must be equal 
with 1 for binomial regression, or the number
--- End diff --

`size must be equal with 1` - should this be `size must be equal to 1`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r122135628
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+row <- 0
+col <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  if (class(lowerBoundsOnCoefficients) != "matrix") {
+stop("lowerBoundsOnCoefficients must be a matrix.")
+  }
+  row <- nrow(lowerBoundsOnCoefficients)
+  col <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  if (class(upperBoundsOnCoefficients) != "matrix") {
+stop("upperBoundsOnCoefficients must be a matrix.")
+  }
+
+  if (!is.null(lowerBoundsOnCoefficients) & (row != 
nrow(upperBoundsOnCoefficients)
+| col != ncol(upperBoundsOnCoefficients))) {
--- End diff --

while logically and semantically correct here, might prefer `&&` and `||` 
instead of `&` and `|` here for programmability/correctness in case something 
becomes length > 1 here later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r122135013
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -202,6 +202,20 @@ function(object, path, overwrite = FALSE) {
 #' @param aggregationDepth The depth for treeAggregate (greater than or 
equal to 2). If the dimensions of features
 #' or the number of partitions are large, this 
param could be adjusted to a larger size.
 #' This is an expert parameter. Default value 
should be good for most cases.
+#' @param lowerBoundsOnCoefficients The lower bounds on coefficients if 
fitting under bound constrained optimization.
+#'  The bound matrix must be compatible 
with the shape (1, number of features) for binomial
+#'  regression, or (number of classes, 
number of features) for multinomial regression.
+#'  It is a R matrix.
+#' @param upperBoundsOnCoefficients The upper bounds on coefficients if 
fitting under bound constrained optimization.
+#'  The bound matrix must be compatible 
with the shape (1, number of features) for binomial
+#'  regression, or (number of classes, 
number of features) for multinomial regression.
+#'  It is a R matrix.
+#' @param lowerBoundsOnIntercepts The lower bounds on intercepts if 
fitting under bound constrained optimization.
+#'The bounds vector size must be equal 
with 1 for binomial regression, or the number
+#'of classes for multinomial regression.
+#' @param upperBoundsOnIntercepts The upper bounds on intercepts if 
fitting under bound constrained optimization.
+#'The bound vector size must be equal with 
1 for binomial regression, or the number
--- End diff --

ditto here `size must be equal with 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119995950
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+row <- 0
+col <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  if (class(lowerBoundsOnCoefficients) != "matrix") {
+stop("lowerBoundsOnCoefficients must be a matrix.")
+  }
+  row <- nrow(lowerBoundsOnCoefficients)
+  col <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  if (class(upperBoundsOnCoefficients) != "matrix") {
+stop("upperBoundsOnCoefficients must be a matrix.")
+  }
+
+  if (!is.null(lowerBoundsOnCoefficients) & (row != 
nrow(upperBoundsOnCoefficients)
+| col != ncol(upperBoundsOnCoefficients))) {
+stop(paste("dimension of upperBoundsOnCoefficients ",
+   "is not the same as lowerBoundsOnCoefficients", 
sep = ""))
+  }
+
+  if (is.null(lowerBoundsOnCoefficients)) {
+row <- nrow(upperBoundsOnCoefficients)
+col <- ncol(upperBoundsOnCoefficients)
+  }
--- End diff --

ok thanks, L290-291


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119978881
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+row <- 0
+col <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  if (class(lowerBoundsOnCoefficients) != "matrix") {
+stop("lowerBoundsOnCoefficients must be a matrix.")
+  }
+  row <- nrow(lowerBoundsOnCoefficients)
+  col <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  if (class(upperBoundsOnCoefficients) != "matrix") {
+stop("upperBoundsOnCoefficients must be a matrix.")
+  }
+
+  if (!is.null(lowerBoundsOnCoefficients) & (row != 
nrow(upperBoundsOnCoefficients)
+| col != ncol(upperBoundsOnCoefficients))) {
+stop(paste("dimension of upperBoundsOnCoefficients ",
+   "is not the same as lowerBoundsOnCoefficients", 
sep = ""))
+  }
+
+  if (is.null(lowerBoundsOnCoefficients)) {
+row <- nrow(upperBoundsOnCoefficients)
+col <- ncol(upperBoundsOnCoefficients)
+  }
--- End diff --

This is the case where we only set the upperbound. We can set both or 
either one of them.

For the case that both are set. We enforce upperbound and lowerbound are 
the same dimension, as checked above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119978607
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+row <- 0
+col <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  if (class(lowerBoundsOnCoefficients) != "matrix") {
+stop("lowerBoundsOnCoefficients must be a matrix.")
+  }
+  row <- nrow(lowerBoundsOnCoefficients)
+  col <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  if (class(upperBoundsOnCoefficients) != "matrix") {
+stop("upperBoundsOnCoefficients must be a matrix.")
+  }
+
+  if (!is.null(lowerBoundsOnCoefficients) & (row != 
nrow(upperBoundsOnCoefficients)
+| col != ncol(upperBoundsOnCoefficients))) {
+stop(paste("dimension of upperBoundsOnCoefficients ",
+   "is not the same as lowerBoundsOnCoefficients", 
sep = ""))
+  }
+
+  if (is.null(lowerBoundsOnCoefficients)) {
+row <- nrow(upperBoundsOnCoefficients)
+col <- ncol(upperBoundsOnCoefficients)
+  }
+
+  upperBoundsOnCoefficients <- 
as.array(as.vector(upperBoundsOnCoefficients))
+}
+
 jobj <- 
callJStatic("org.apache.spark.ml.r.LogisticRegressionWrapper", "fit",
 data@sdf, formula, as.numeric(regParam),
 as.numeric(elasticNetParam), 
as.integer(maxIter),
 as.numeric(tol), as.character(family),
 as.logical(standardization), 
as.array(thresholds),
-weightCol, as.integer(aggregationDepth))
+weightCol, as.integer(aggregationDepth),
+as.integer(row), as.integer(col),
--- End diff --

nit: no need to `as.integer(row)` and `as.integer(col)` since they are set 
internally and not a parameter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119978589
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+row <- 0
+col <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  if (class(lowerBoundsOnCoefficients) != "matrix") {
+stop("lowerBoundsOnCoefficients must be a matrix.")
+  }
+  row <- nrow(lowerBoundsOnCoefficients)
+  col <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  if (class(upperBoundsOnCoefficients) != "matrix") {
+stop("upperBoundsOnCoefficients must be a matrix.")
+  }
+
+  if (!is.null(lowerBoundsOnCoefficients) & (row != 
nrow(upperBoundsOnCoefficients)
+| col != ncol(upperBoundsOnCoefficients))) {
+stop(paste("dimension of upperBoundsOnCoefficients ",
--- End diff --

paste would insert space - use paste0 instead or just remove the space in 
the string to let it to add it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119978644
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,64 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+row <- 0
+col <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  if (class(lowerBoundsOnCoefficients) != "matrix") {
+stop("lowerBoundsOnCoefficients must be a matrix.")
+  }
+  row <- nrow(lowerBoundsOnCoefficients)
+  col <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  if (class(upperBoundsOnCoefficients) != "matrix") {
+stop("upperBoundsOnCoefficients must be a matrix.")
+  }
+
+  if (!is.null(lowerBoundsOnCoefficients) & (row != 
nrow(upperBoundsOnCoefficients)
+| col != ncol(upperBoundsOnCoefficients))) {
+stop(paste("dimension of upperBoundsOnCoefficients ",
+   "is not the same as lowerBoundsOnCoefficients", 
sep = ""))
+  }
+
+  if (is.null(lowerBoundsOnCoefficients)) {
+row <- nrow(upperBoundsOnCoefficients)
+col <- ncol(upperBoundsOnCoefficients)
+  }
--- End diff --

given how this is used later in scala code, should there be a check that 
nrow(upper) == nrow(lower) and ditto for ncol(upper) == ncol(lower)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119911447
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,57 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+lrow <- 0
+lcol <- 0
+urow <- 0
+ucol <- 0
--- End diff --

Oh, I think I can do the check because I have a `NULL` check before 
enforcing the rule.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119911006
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,57 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+lrow <- 0
+lcol <- 0
+urow <- 0
+ucol <- 0
--- End diff --

Question: Based on my understanding, `lowerBoundsOnCoefficients ` and 
`upperBoundsOnCoefficients ` are not required to set at the same time. They can 
be set at the same time.
For the first case, we can't enforce the dimension of the two matrices 
because one could be `NULL`. 
For the second case, we can check it.

So, we can't enforce the rule in general.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119788377
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R ---
@@ -225,6 +225,32 @@ test_that("spark.logit", {
   model2 <- spark.logit(df2, label ~ feature, weightCol = "weight")
   prediction2 <- collect(select(predict(model2, df2), "prediction"))
   expect_equal(sort(prediction2$prediction), c("0.0", "0.0", "0.0", "0.0", 
"0.0"))
+
+  # Test binomial logistic regression againt two classes with 
upperBoundsOnCoefficients
+  # and upperBoundsOnIntercepts
+  u <- matrix(c(1.0, 0.0, 1.0, 0.0), nrow = 1, ncol = 4)
+  model <- spark.logit(training, Species ~ ., upperBoundsOnCoefficients = 
u,
+   upperBoundsOnIntercepts = 1.0)
+  summary <- summary(model)
+  coefsR <- c(-11.13331, 1.0, 0.0, 1.0, 0.0)
+  coefs <- summary$coefficients[, "Estimate"]
+  expect_true(all(abs(coefsR - coefs) < 0.1))
+  # Test upperBoundsOnCoefficients should be matrix
+  expect_error(spark.logit(training, Species ~ ., 
upperBoundsOnCoefficients = as.array(c(1, 2)),
+   upperBoundsOnIntercepts = 1.0))
+
+  # Test binomial logistic regression againt two classes with 
lowerBoundsOnCoefficients
+  # and lowerBoundsOnIntercepts
+  l <- matrix(c(0.0, -1.0, 0.0, -1.0), nrow = 1, ncol = 4)
+  model <- spark.logit(training, Species ~ ., lowerBoundsOnCoefficients = 
l,
+   lowerBoundsOnIntercepts = 0.0)
+  summary <- summary(model)
+  coefsR <- c(0, 0, -1, 0, 1.902192)
+  coefs <- summary$coefficients[, "Estimate"]
+  expect_true(all(abs(coefsR - coefs) < 0.1))
+  # Test lowerBoundsOnCoefficients should be matrix
+  expect_error(spark.logit(training, Species ~ ., 
lowerBoundsOnCoefficients = as.array(c(1, 2)),
+   lowerBoundsOnIntercepts = 0.0))
--- End diff --

Could you add test for ```multinomial``` logistic regression? I think only 
test one side bound is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119787497
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,57 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+lrow <- 0
+lcol <- 0
+urow <- 0
+ucol <- 0
--- End diff --

Here we can reduce from four parameters to two: ```nrow``` and ```ncol```, 
since the matrix format of ```lowerBoundsOnCoefficients``` and 
```upperBoundsOnCoefficients``` should be consistent. If the input doesn't 
confirm with this rule, we need to throw exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-06-02 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r119788169
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/LogisticRegressionWrapper.scala ---
@@ -97,7 +97,15 @@ private[r] object LogisticRegressionWrapper
   standardization: Boolean,
   thresholds: Array[Double],
   weightCol: String,
-  aggregationDepth: Int
+  aggregationDepth: Int,
+  lrow: Int,
+  lcol: Int,
+  urow: Int,
+  ucol: Int,
--- End diff --

See my comment above, we can merge the four parameters into two: 
```numRowsOfBoundsOnCoefficients``` and ```numColsOfBoundsOnCoefficients```, 
please follow Scala naming convention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-05-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18128#discussion_r118827649
  
--- Diff: R/pkg/R/mllib_classification.R ---
@@ -239,21 +253,51 @@ function(object, path, overwrite = FALSE) {
 setMethod("spark.logit", signature(data = "SparkDataFrame", formula = 
"formula"),
   function(data, formula, regParam = 0.0, elasticNetParam = 0.0, 
maxIter = 100,
tol = 1E-6, family = "auto", standardization = TRUE,
-   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2) {
+   thresholds = 0.5, weightCol = NULL, aggregationDepth = 
2,
+   lowerBoundsOnCoefficients = NULL, 
upperBoundsOnCoefficients = NULL,
+   lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts 
= NULL) {
 formula <- paste(deparse(formula), collapse = "")
+lrow <- 0
+lcol <- 0
+urow <- 0
+ucol <- 0
 
 if (!is.null(weightCol) && weightCol == "") {
   weightCol <- NULL
 } else if (!is.null(weightCol)) {
   weightCol <- as.character(weightCol)
 }
 
+if (!is.null(lowerBoundsOnIntercepts)) {
+lowerBoundsOnIntercepts <- 
as.array(lowerBoundsOnIntercepts)
+}
+
+if (!is.null(upperBoundsOnIntercepts)) {
+upperBoundsOnIntercepts <- 
as.array(upperBoundsOnIntercepts)
+}
+
+if (!is.null(lowerBoundsOnCoefficients)) {
+  lrow <- nrow(lowerBoundsOnCoefficients)
+  lcol <- ncol(lowerBoundsOnCoefficients)
+  lowerBoundsOnCoefficients <- 
as.array(as.vector(lowerBoundsOnCoefficients))
+}
+
+if (!is.null(upperBoundsOnCoefficients)) {
+  urow <- nrow(upperBoundsOnCoefficients)
+  ucol <- ncol(upperBoundsOnCoefficients)
--- End diff --

could you add some check for upperBoundsOnCoefficients or 
lowerBoundsOnCoefficients
for example, if upperBoundsOnCoefficients is a vector instead of a matrix, 
ncol(as.array(upperBoundsOnCoefficients)) will be NA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18128: [SPARK-20906][SparkR]:Constrained Logistic Regres...

2017-05-27 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/18128

[SPARK-20906][SparkR]:Constrained Logistic Regression for SparkR

## What changes were proposed in this pull request?

PR https://github.com/apache/spark/pull/17715 Added Constrained Logistic 
Regression for ML. We should add it to SparkR.

## How was this patch tested?

Add new unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18128


commit 1fc68f69ecce46c8d4c2bbd2d9aafdd042c27108
Author: wangmiao1981 
Date:   2017-05-27T06:27:04Z

add constraint logit

commit 7627ac9c093ba72afd586c3ea1e482238d29c3c3
Author: wangmiao1981 
Date:   2017-05-27T07:29:25Z

add unit test and doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org