[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2015-06-30 Thread dbtsai
Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/1518


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2015-03-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-77406914
  
I'm looking at really old PRs -- this is obsolete now, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-12-22 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1518#discussion_r22171070
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.optimization
+
+import scala.collection.mutable.ListBuffer
+import scala.math._
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV}
+
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+
+abstract class Regularizer extends Serializable {
+  var isSmooth: Boolean = true
+
+  def add(that: Regularizer): CompositeRegularizer = {
+(new CompositeRegularizer).add(this).add(that)
+  }
+
+  def compute(weights: Vector, cumGradient: Vector): Double
+}
+
+class SimpleRegularizer extends Regularizer {
+isSmooth = true
+
+  override def compute(weights: Vector, cumGradient: Vector): Double = 0
+}
+
+class CompositeRegularizer extends Regularizer {
+   isSmooth = true
+
+  protected val regularizers = ListBuffer[Regularizer]()
+
+  override def add(that: Regularizer): this.type = {
+if (this.isSmooth  !that.isSmooth) isSmooth = false
+regularizers.append(that)
+this
+  }
+
+  override def compute(weights: Vector, cumGradient: Vector): Double = {
+if (regularizers.isEmpty) {
+  0.0
+} else {
+  regularizers.foldLeft(0.0)((loss: Double, x: Regularizer) =
+loss + x.compute(weights, cumGradient)
+  )
+}
+  }
+}
+
+class L1Regularizer(private val regParam: BV[Double]) extends Regularizer {
+   isSmooth = false
+
+  def this(regParam: Double) = this(new 
BDV[Double](Array[Double](regParam)))
+
+  def this(regParam: Vector) = this(regParam.toBreeze)
+
+  def compute(weights: Vector, cumGradient: Vector): Double = {
+val brzWeights = weights.toBreeze
+val brzCumGradient = cumGradient.toBreeze
+
+if (regParam.length  1) require(brzWeights.length == regParam.length)
+
+if (regParam.length == 1  regParam(0) == 0.0) {
+  0.0
+}
+else {
+  var loss: Double = 0.0
+  brzWeights.activeIterator.foreach {
+case (_, 0.0) = // Skip explicit zero elements.
--- End diff --

The case statement will not affect performance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-12-22 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1518#discussion_r22173571
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.optimization
+
+import scala.collection.mutable.ListBuffer
+import scala.math._
+
+import breeze.linalg.{DenseVector = BDV, Vector = BV}
+
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+
+abstract class Regularizer extends Serializable {
+  var isSmooth: Boolean = true
+
+  def add(that: Regularizer): CompositeRegularizer = {
+(new CompositeRegularizer).add(this).add(that)
+  }
+
+  def compute(weights: Vector, cumGradient: Vector): Double
+}
+
+class SimpleRegularizer extends Regularizer {
+isSmooth = true
+
+  override def compute(weights: Vector, cumGradient: Vector): Double = 0
+}
+
+class CompositeRegularizer extends Regularizer {
+   isSmooth = true
+
+  protected val regularizers = ListBuffer[Regularizer]()
+
+  override def add(that: Regularizer): this.type = {
+if (this.isSmooth  !that.isSmooth) isSmooth = false
+regularizers.append(that)
+this
+  }
+
+  override def compute(weights: Vector, cumGradient: Vector): Double = {
+if (regularizers.isEmpty) {
+  0.0
+} else {
+  regularizers.foldLeft(0.0)((loss: Double, x: Regularizer) =
+loss + x.compute(weights, cumGradient)
+  )
+}
+  }
+}
+
+class L1Regularizer(private val regParam: BV[Double]) extends Regularizer {
+   isSmooth = false
+
+  def this(regParam: Double) = this(new 
BDV[Double](Array[Double](regParam)))
+
+  def this(regParam: Vector) = this(regParam.toBreeze)
+
+  def compute(weights: Vector, cumGradient: Vector): Double = {
+val brzWeights = weights.toBreeze
+val brzCumGradient = cumGradient.toBreeze
+
+if (regParam.length  1) require(brzWeights.length == regParam.length)
+
+if (regParam.length == 1  regParam(0) == 0.0) {
+  0.0
+}
+else {
+  var loss: Double = 0.0
+  brzWeights.activeIterator.foreach {
+case (_, 0.0) = // Skip explicit zero elements.
--- End diff --

This PR is not finished yet. Will replace this with the new implemented api 
`foreachActive`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-08-04 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-51151194
  
This looks promising. FWIW, I support decoupling regularization from the 
raw gradient update and believe it is a good way to go - it will allow various 
update/learning rate schemes (adagrad, normalized adaptive gradient, etc) to be 
applied independent of the regularization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-08-04 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-51151346
  
It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll 
use this at Alpine implementation first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-30 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-50663418
  
I tried to make the bias really big to make the intercept smaller to avoid 
being regularized. The result is still quite different from R, and very 
sensitive to the strength of bias.

Users may re-scale the features to improve the convergence of optimization 
process, and in order to get the same coefficients without scaling, each 
component has to be penalized differently. Also, users may know which feature 
is less important, and want to penalize more. 

As a result, I still want to implement the full weighted regualizer, and 
de-couple the adaptive learning rate from updater. Let's talk in detail when we 
meet tomorrow. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-30 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-50691925
  
I think this is the approach LIBLINEAR uses. Yes, let's discuss tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-50441485
  
@dbtsai I thought another way to do this and want to know your opinion. We 
can add an optional argument to `appendBias`: `appendBias(bias: Double = 1.0)`. 
If this is used in adding intercept, we can add a large bias so the 
corresponding weight gets less regularized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-21 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/1518

[SPARK-2505][MLlib] Weighted Regularizer for Generalized Linear Model 

(Note: This is not ready to be merged. Need documentation, and make sure 
it's backforwad compatible with Spark 1.0 apis). 

The current implementation of regularization in linear model is using 
`Updater`, and this design has couple issues as the following.
1) It will penalize all the weights including intercept. In machine 
learning training process, typically, people don't penalize the intercept. 
2) The `Updater` has the logic of adaptive step size for gradient decent, 
and we would like to clean it up by separating the logic of regularization out 
from updater to regularizer so in LBFGS optimizer, we don't need the trick for 
getting the loss and gradient of objective function.
In this work, a weighted regularizer will be implemented, and users can 
exclude the intercept or any weight from regularization by setting that term 
with zero weighted penalty. Since the regularizer will return a tuple of loss 
and gradient, the adaptive step size logic, and soft thresholding for L1 in 
Updater will be moved to SGD optimizer.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/AlpineNow/spark SPARK-2505_regularizer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1518


commit 2946930ec3de0e0a34e07d065c954d7aabacd4ba
Author: DB Tsai dbt...@alpinenow.com
Date:   2014-07-19T02:15:37Z

initial work




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-49670761
  
QA tests have started for PR 1518. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16928/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-49670856
  
QA results for PR 1518:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brabstract class Regularizer extends Serializable {brclass 
SimpleRegularizer extends Regularizer {brclass CompositeRegularizer extends 
Regularizer {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16928/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---