[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18896 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18896#discussion_r134076580 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1392,6 +1415,61 @@ class LogisticRegressionSuite assert(model2.interceptVector.toArray.sum ~== 0.0 absTol eps) } + test("multinomial logistic regression with zero variance (SPARK-21681)") { +val sqlContext = multinomialDatasetWithZeroVar.sqlContext +import sqlContext.implicits._ +val mlr = new LogisticRegression().setFamily("multinomial").setFitIntercept(true) + .setElasticNetParam(0.0).setRegParam(0.0).setStandardization(true).setWeightCol("weight") + +val model = mlr.fit(multinomialDatasetWithZeroVar) + +/* + Use the following R code to load the data and train the model using glmnet package. + + library("glmnet") + data <- read.csv("path", header=FALSE) + label = as.factor(data$V1) + w = data$V2 + features = as.matrix(data.frame(data$V3, data$V4)) + coefficients = coef(glmnet(features, label, weights=w, family="multinomial", + alpha = 0, lambda = 0)) + coefficients + $`0` + 3 x 1 sparse Matrix of class "dgCMatrix" +s0 + 0.2658824 + data.V3 0.1881871 + data.V4 . + + $`1` + 3 x 1 sparse Matrix of class "dgCMatrix" + s0 + 0.53604701 + data.V3 -0.02412645 + data.V4 . + + $`2` + 3 x 1 sparse Matrix of class "dgCMatrix" + s0 + -0.8019294 + data.V3 -0.1640607 + data.V4 . +*/ + +val coefficientsR = new DenseMatrix(3, 2, Array( + 0.1881871, -0.0, --- End diff -- Why `-0.0`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/18896#discussion_r134076552 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala --- @@ -238,8 +238,17 @@ class LogisticAggregatorSuite extends SparkFunSuite with MLlibTestSparkContext { val aggConstantFeature = getNewAggregator(instancesConstantFeature, Vectors.dense(coefArray ++ interceptArray), fitIntercept = true, isMultinomial = true) instances.foreach(aggConstantFeature.add) + // constant features should not affect gradient -assert(aggConstantFeature.gradient(0) === 0.0) +def validateGradient(grad: Vector): Unit = { + assert(grad(0) === 0.0) + grad.toArray.foreach { gradientValue => --- End diff -- The problem with this test was that it checked that part of the gradient was zero, but didn't check that the rest of the gradient was correct. Here, you're checking that the rest of the gradient isn't nan or infinite, but not that it's actually correct. A more appropriate test, IMO, is to also run an aggregator over the same instances with the constant feature filtered out, then check that the portion of the gradients they share are the same. e.g. scala val aggConstantFeature = getNewAggregator(instancesConstantFeature, Vectors.dense(coefArray ++ interceptArray), fitIntercept = true, isMultinomial = true) val filteredInstances = instancesConstantFeature.map { case Instance(l, w, f) => Instance(l, w, Vectors.dense(f.toArray.tail)) } val aggMultinomial = getNewAggregator(filteredInstances, Vectors.dense(coefArray.slice(3, 6) ++ interceptArray), fitIntercept = true, isMultinomial = true) filteredInstances.foreach(aggMultinomial.add) instancesConstantFeature.foreach(aggConstantFeature.add) // constant features should not affect gradient assert(aggConstantFeature.gradient.toArray.take(numClasses) === Array.fill(numClasses)(0.0)) assert(aggMultinomial.gradient.toArray === aggConstantFeature.gradient.toArray.slice(3, 9)) Just to note, this code is just for an example, not meant to be copy and pasted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/18896#discussion_r133301688 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -1392,6 +1415,61 @@ class LogisticRegressionSuite assert(model2.interceptVector.toArray.sum ~== 0.0 absTol eps) } + test("test SPARK-21681") { --- End diff -- I would include a description of the test in addition to the ticket #. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/18896 [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero ## What changes were proposed in this pull request? fix bug of MLOR do not work correctly when featureStd contains zero ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark fix_mlor_stdvalue_zero_bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18896.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18896 commit c415ddeb1182f8243e5330d294665079c21a8a19 Author: WeichenXuDate: 2017-08-09T17:17:43Z init pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org