[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-20 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46694507
  
Merged. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1104


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread BaiGang
Github user BaiGang commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46403426
  
Thanks, @dbtsai .
Just did a git rebase. Will wait for the conclusion of consolidated 
interfaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46411430
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13905413
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala ---
@@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with 
LocalSparkContext with Matchers {
 assert(lossLBFGS3.length == 6)
 assert((lossLBFGS3(4) - lossLBFGS3(5)) / lossLBFGS3(4)  
convergenceTol)
   }
+
--- End diff --

Sorry I didn't see this error either ... Let's keep both in tests for 
better test coverage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13905479
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala ---
@@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with 
LocalSparkContext with Matchers {
 assert(lossLBFGS3.length == 6)
 assert((lossLBFGS3(4) - lossLBFGS3(5)) / lossLBFGS3(4)  
convergenceTol)
   }
+
+  test(Optimize via class LBFGS.) {
+val regParam = 0.2
+
+// Prepare another non-zero weights to compare the loss in the first 
iteration.
+val initialWeightsWithIntercept = Vectors.dense(0.3, 0.12)
+val convergenceTol = 1e-12
+val maxNumIterations = 10
+
+val lbfgsOptimizer = new LBFGS(gradient, squaredL2Updater)
+.setNumCorrections(numCorrections)
+.setConvergenceTol(convergenceTol)
+.setMaxNumIterations(maxNumIterations)
+.setRegParam(regParam)
+
+val weightLBFGS = lbfgsOptimizer.optimize(dataRDD, 
initialWeightsWithIntercept)
+
+val numGDIterations = 50
+val stepSize = 1.0
+val (weightGD, _) = GradientDescent.runMiniBatchSGD(
+  dataRDD,
+  gradient,
+  squaredL2Updater,
+  stepSize,
+  numGDIterations,
+  regParam,
+  miniBatchFrac,
+  initialWeightsWithIntercept)
+
+// for class LBFGS and the optimize method, we only look at the weights
+assert(compareDouble(weightLBFGS(0), weightGD(0), 0.02) 
+  compareDouble(weightLBFGS(1), weightGD(1), 0.02),
+  The weight differences between LBFGS and GD should be within 2%.)
+   
--- End diff --

Remove empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13905461
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala ---
@@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with 
LocalSparkContext with Matchers {
 assert(lossLBFGS3.length == 6)
 assert((lossLBFGS3(4) - lossLBFGS3(5)) / lossLBFGS3(4)  
convergenceTol)
   }
+
+  test(Optimize via class LBFGS.) {
+val regParam = 0.2
+
+// Prepare another non-zero weights to compare the loss in the first 
iteration.
+val initialWeightsWithIntercept = Vectors.dense(0.3, 0.12)
+val convergenceTol = 1e-12
+val maxNumIterations = 10
+
+val lbfgsOptimizer = new LBFGS(gradient, squaredL2Updater)
+.setNumCorrections(numCorrections)
--- End diff --

two space indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46411830
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46411812
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46411779
  
The change looks good to me. Let us wait for Jenkins and MIMA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13905548
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala ---
@@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with 
LocalSparkContext with Matchers {
 assert(lossLBFGS3.length == 6)
 assert((lossLBFGS3(4) - lossLBFGS3(5)) / lossLBFGS3(4)  
convergenceTol)
   }
+
--- End diff --

We may add the same test to SGD as well. My bad. Our internal one is right. 
Probably when I copy and paste, I don't do thing right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46412293
  
I think it will be a problem for MIMA to change the signature. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46415414
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15870/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46415413
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread BaiGang
Github user BaiGang commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13910039
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala ---
@@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with 
LocalSparkContext with Matchers {
 assert(lossLBFGS3.length == 6)
 assert((lossLBFGS3(4) - lossLBFGS3(5)) / lossLBFGS3(4)  
convergenceTol)
   }
+
+  test(Optimize via class LBFGS.) {
+val regParam = 0.2
+
+// Prepare another non-zero weights to compare the loss in the first 
iteration.
+val initialWeightsWithIntercept = Vectors.dense(0.3, 0.12)
+val convergenceTol = 1e-12
+val maxNumIterations = 10
+
+val lbfgsOptimizer = new LBFGS(gradient, squaredL2Updater)
+.setNumCorrections(numCorrections)
+.setConvergenceTol(convergenceTol)
+.setMaxNumIterations(maxNumIterations)
+.setRegParam(regParam)
+
+val weightLBFGS = lbfgsOptimizer.optimize(dataRDD, 
initialWeightsWithIntercept)
+
+val numGDIterations = 50
+val stepSize = 1.0
+val (weightGD, _) = GradientDescent.runMiniBatchSGD(
+  dataRDD,
+  gradient,
+  squaredL2Updater,
+  stepSize,
+  numGDIterations,
+  regParam,
+  miniBatchFrac,
+  initialWeightsWithIntercept)
+
+// for class LBFGS and the optimize method, we only look at the weights
+assert(compareDouble(weightLBFGS(0), weightGD(0), 0.02) 
+  compareDouble(weightLBFGS(1), weightGD(1), 0.02),
+  The weight differences between LBFGS and GD should be within 2%.)
+   
--- End diff --

Thanks @mengxr , I'll correct the indentation and remove the empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46503707
  
Jenkins, add to white list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46503735
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46503726
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46506731
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15884/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46506729
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread BaiGang
GitHub user BaiGang opened a pull request:

https://github.com/apache/spark/pull/1104

[SPARK-2163] class LBFGS optimize with Double tolerance instead of Int

https://issues.apache.org/jira/browse/SPARK-2163

This pull request includes the change for **[SPARK-2163]**:

* Changed the convergence tolerance parameter from type `Int` to type 
`Double`.
* Added types for vars in `class LBFGS`, making the style consistent with 
`class GradientDescent`.
* Added associated test to check that optimizing via `class LBFGS` produces 
the same results as via calling `runLBFGS` from `object LBFGS`.

This is a very minor change but it will solve the problem in my 
implementation of a regression model for count data, where I make use of LBFGS 
for parameter estimation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BaiGang/spark fix_int_tol

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1104


commit 30cd381b9cc465ddbc4fd4877464016a26dcb551
Author: Gang Bai m...@baigang.net
Date:   2014-06-17T09:51:16Z

Changed setConvergenceTol'' to specify tolerance with a parameter of type 
Double. For the reason and the problem caused by an Int parameter, please check 
https://issues.apache.org/jira/browse/SPARK-2163. Added the types for the 
private vars in class LBFGS. Added a test in LBFGSSuite for validating that 
optimizing via class LBFGS produces the same results as calling runLBFGS from 
object LBFGS.

commit d9c2d14e46c1ad7153eec51874ddeff4926948f7
Author: Gang Bai m...@baigang.net
Date:   2014-06-17T09:54:39Z

Merge remote-tracking branch 'origin/master' into tol_int




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46289980
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13897737
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala ---
@@ -38,10 +38,10 @@ import org.apache.spark.mllib.linalg.{Vectors, Vector}
 class LBFGS(private var gradient: Gradient, private var updater: Updater)
   extends Optimizer with Logging {
 
-  private var numCorrections = 10
-  private var convergenceTol = 1E-4
-  private var maxNumIterations = 100
-  private var regParam = 0.0
+  private var numCorrections: Int = 10
+  private var convergenceTol: Double = 1E-4
+  private var maxNumIterations: Int = 100
+  private var regParam: Double = 0.0
 
--- End diff --

In most of the mllib codebase, we don't specify the type of variable. Can 
you remove them? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1104#discussion_r13897825
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/optimization/LBFGSSuite.scala ---
@@ -195,4 +195,39 @@ class LBFGSSuite extends FunSuite with 
LocalSparkContext with Matchers {
 assert(lossLBFGS3.length == 6)
 assert((lossLBFGS3(4) - lossLBFGS3(5)) / lossLBFGS3(4)  
convergenceTol)
   }
+
--- End diff --

The bug isn't found because we only test the static runLBFGS method instead 
of the class. We probably can change all the existing tests to use the one in 
class, so we don't need to add another test.  

@mengxr what do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread BaiGang
Github user BaiGang commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46393686
  
@dbtsai Thanks for looking into this PR. I've updated the code, removing 
the unnecessary type specifications for the vars in class LBFGS. 

Also I have a question. If I externally (outside mllib) make uses of LBFGS, 
should I use the object method LBFGS.runLBFGS(.) or create an instance of class 
LBFGS and run the optimize(.) method? Which one of the is the supposed proper 
usage of the LBFGS optimizer? As for the tests, I think we should just test 
against that usage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2163] class LBFGS optimize with Double ...

2014-06-17 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1104#issuecomment-46393840
  
I think it's legacy reason to have two different way to access the API. As 
far as I know, @mengxr is working on consolidating the interface. He probably 
can talk about more on this topic. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---