[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/1518 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-77406914 I'm looking at really old PRs -- this is obsolete now, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1518#discussion_r22171070 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ListBuffer +import scala.math._ + +import breeze.linalg.{DenseVector = BDV, Vector = BV} + +import org.apache.spark.mllib.linalg.{Vectors, Vector} + +abstract class Regularizer extends Serializable { + var isSmooth: Boolean = true + + def add(that: Regularizer): CompositeRegularizer = { +(new CompositeRegularizer).add(this).add(that) + } + + def compute(weights: Vector, cumGradient: Vector): Double +} + +class SimpleRegularizer extends Regularizer { +isSmooth = true + + override def compute(weights: Vector, cumGradient: Vector): Double = 0 +} + +class CompositeRegularizer extends Regularizer { + isSmooth = true + + protected val regularizers = ListBuffer[Regularizer]() + + override def add(that: Regularizer): this.type = { +if (this.isSmooth !that.isSmooth) isSmooth = false +regularizers.append(that) +this + } + + override def compute(weights: Vector, cumGradient: Vector): Double = { +if (regularizers.isEmpty) { + 0.0 +} else { + regularizers.foldLeft(0.0)((loss: Double, x: Regularizer) = +loss + x.compute(weights, cumGradient) + ) +} + } +} + +class L1Regularizer(private val regParam: BV[Double]) extends Regularizer { + isSmooth = false + + def this(regParam: Double) = this(new BDV[Double](Array[Double](regParam))) + + def this(regParam: Vector) = this(regParam.toBreeze) + + def compute(weights: Vector, cumGradient: Vector): Double = { +val brzWeights = weights.toBreeze +val brzCumGradient = cumGradient.toBreeze + +if (regParam.length 1) require(brzWeights.length == regParam.length) + +if (regParam.length == 1 regParam(0) == 0.0) { + 0.0 +} +else { + var loss: Double = 0.0 + brzWeights.activeIterator.foreach { +case (_, 0.0) = // Skip explicit zero elements. --- End diff -- The case statement will not affect performance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/1518#discussion_r22173571 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.optimization + +import scala.collection.mutable.ListBuffer +import scala.math._ + +import breeze.linalg.{DenseVector = BDV, Vector = BV} + +import org.apache.spark.mllib.linalg.{Vectors, Vector} + +abstract class Regularizer extends Serializable { + var isSmooth: Boolean = true + + def add(that: Regularizer): CompositeRegularizer = { +(new CompositeRegularizer).add(this).add(that) + } + + def compute(weights: Vector, cumGradient: Vector): Double +} + +class SimpleRegularizer extends Regularizer { +isSmooth = true + + override def compute(weights: Vector, cumGradient: Vector): Double = 0 +} + +class CompositeRegularizer extends Regularizer { + isSmooth = true + + protected val regularizers = ListBuffer[Regularizer]() + + override def add(that: Regularizer): this.type = { +if (this.isSmooth !that.isSmooth) isSmooth = false +regularizers.append(that) +this + } + + override def compute(weights: Vector, cumGradient: Vector): Double = { +if (regularizers.isEmpty) { + 0.0 +} else { + regularizers.foldLeft(0.0)((loss: Double, x: Regularizer) = +loss + x.compute(weights, cumGradient) + ) +} + } +} + +class L1Regularizer(private val regParam: BV[Double]) extends Regularizer { + isSmooth = false + + def this(regParam: Double) = this(new BDV[Double](Array[Double](regParam))) + + def this(regParam: Vector) = this(regParam.toBreeze) + + def compute(weights: Vector, cumGradient: Vector): Double = { +val brzWeights = weights.toBreeze +val brzCumGradient = cumGradient.toBreeze + +if (regParam.length 1) require(brzWeights.length == regParam.length) + +if (regParam.length == 1 regParam(0) == 0.0) { + 0.0 +} +else { + var loss: Double = 0.0 + brzWeights.activeIterator.foreach { +case (_, 0.0) = // Skip explicit zero elements. --- End diff -- This PR is not finished yet. Will replace this with the new implemented api `foreachActive`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-51151194 This looks promising. FWIW, I support decoupling regularization from the raw gradient update and believe it is a good way to go - it will allow various update/learning rate schemes (adagrad, normalized adaptive gradient, etc) to be applied independent of the regularization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-51151346 It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll use this at Alpine implementation first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50663418 I tried to make the bias really big to make the intercept smaller to avoid being regularized. The result is still quite different from R, and very sensitive to the strength of bias. Users may re-scale the features to improve the convergence of optimization process, and in order to get the same coefficients without scaling, each component has to be penalized differently. Also, users may know which feature is less important, and want to penalize more. As a result, I still want to implement the full weighted regualizer, and de-couple the adaptive learning rate from updater. Let's talk in detail when we meet tomorrow. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50691925 I think this is the approach LIBLINEAR uses. Yes, let's discuss tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-50441485 @dbtsai I thought another way to do this and want to know your opinion. We can add an optional argument to `appendBias`: `appendBias(bias: Double = 1.0)`. If this is used in adding intercept, we can add a large bias so the corresponding weight gets less regularized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1518 [SPARK-2505][MLlib] Weighted Regularizer for Generalized Linear Model (Note: This is not ready to be merged. Need documentation, and make sure it's backforwad compatible with Spark 1.0 apis). The current implementation of regularization in linear model is using `Updater`, and this design has couple issues as the following. 1) It will penalize all the weights including intercept. In machine learning training process, typically, people don't penalize the intercept. 2) The `Updater` has the logic of adaptive step size for gradient decent, and we would like to clean it up by separating the logic of regularization out from updater to regularizer so in LBFGS optimizer, we don't need the trick for getting the loss and gradient of objective function. In this work, a weighted regularizer will be implemented, and users can exclude the intercept or any weight from regularization by setting that term with zero weighted penalty. Since the regularizer will return a tuple of loss and gradient, the adaptive step size logic, and soft thresholding for L1 in Updater will be moved to SGD optimizer. You can merge this pull request into a Git repository by running: $ git pull https://github.com/AlpineNow/spark SPARK-2505_regularizer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1518 commit 2946930ec3de0e0a34e07d065c954d7aabacd4ba Author: DB Tsai dbt...@alpinenow.com Date: 2014-07-19T02:15:37Z initial work --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-49670761 QA tests have started for PR 1518. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16928/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1518#issuecomment-49670856 QA results for PR 1518:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brabstract class Regularizer extends Serializable {brclass SimpleRegularizer extends Regularizer {brclass CompositeRegularizer extends Regularizer {brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16928/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---