Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
oh, I meant to say cdh5.1.3 used by Jakub's company is based on 2.3. You can see it from the first part of the Cloudera's version number - 2.3.0-cdh 5.1.3. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https

Re: [mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread DB Tsai
, there are different strategies to do feature scalling for linear regression and logistic regression; as a result, we don't want to make it public api naively without addressing different use-case. Sincerely, DB Tsai --- My Blog: https

Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-10-09 Thread DB Tsai
, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, Sep 29, 2014 at 11:45 AM, Yanbo Liang yanboha...@gmail.com wrote: Thank you for all your patient response. I can conclude that if the data

Re: Fwd: Breeze Library usage in Spark

2014-10-03 Thread DB Tsai
You dont have to include breeze jar which is already in spark assembly jar. For native one, its optional. Sent from my Google Nexus 5 On Oct 3, 2014 8:04 PM, Priya Ch learnings.chitt...@gmail.com wrote: yes. I have included breeze-0.9 in build.sbt file. I ll change this to 0.7. Apart from

Re: [mllib] LogisticRegressionWithLBFGS interface is not consistent with LogisticRegressionWithSGD

2014-09-13 Thread DB Tsai
. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Sat, Sep 13, 2014 at 2:12 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi All, I found that LogisticRegressionWithLBFGS interface

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-06 Thread DB Tsai
One related question, is mllib jar independent from hadoop version (doesnt use hadoop api directly)? Can I use mllib jar compile for one version of hadoop and use it in another version of hadoop? Sent from my Google Nexus 5 On Aug 6, 2014 8:29 AM, Debasish Das debasish.da...@gmail.com wrote: Hi

Re: Buidling spark in Eclipse Kepler

2014-08-06 Thread DB Tsai
After sbt gen-idea , you can open the intellji project directly without going through pom.xml If u want to compile inside intellji, you have to remove one of the messo jar. This is an open issue, and u can find the detail in JIRA. Sent from my Google Nexus 5 On Aug 6, 2014 8:54 PM, Ron Gonzalez

Re: OWLQN

2014-07-18 Thread DB Tsai
I'm working on it with weighted regularization. The problem is that OWLQN doesn't work nicely with Updater now since all the L1 logic should be in OWLQN instead of L1Updater. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn

Re: [VOTE] Release Apache Spark 0.9.2 (RC1)

2014-07-17 Thread DB Tsai
+1 Tested with my Ubuntu Linux. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Thu, Jul 17, 2014 at 6:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac, verified

SBT gen-idea doesn't work well after merging SPARK-1776

2014-07-14 Thread DB Tsai
) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-05 Thread DB Tsai
+1 On Jul 5, 2014 1:39 PM, Michael Armbrust mich...@databricks.com wrote: +1 I tested sql/hive functionality. On Sat, Jul 5, 2014 at 9:30 AM, Mark Hamstra m...@clearstorydata.com wrote: +1 On Fri, Jul 4, 2014 at 12:40 PM, Patrick Wendell pwend...@gmail.com wrote: I'll start

Re: Int tolerance in LBFGS.setConvergenceTol causes problems

2014-06-17 Thread DB Tsai
Hi Gang, This is a bug, and I'm the one who did it :) Just add the comment to your PR. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Jun 17, 2014 at 7:13 PM, Gang Bai

Re: Standard preprocessing/scaling

2014-05-28 Thread DB Tsai
Sometimes for this case, I will just standardize without centerization. I still get good result. Sent from my Google Nexus 5 On May 28, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: RowMatrix has a method to compute column summary statistics. There is a trade-off here because centering

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
the protected method `addURL` which will not work and throw exception if the code is wrapped in security manager. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, May 21, 2014 at 1:13 PM, Sandy

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
the primary jar is not in the system loader but custom one, so when we reference those jars added dynamically, we can find it without reflection. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
environment. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, May 21, 2014 at 2:57 PM, Koert Kuipers ko...@tresata.com wrote: db tsai, i do not think userClassPathFirst is working

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
: Method = classOf[URLClassLoader].getDeclaredMethod(addURL, classOf[URL]) method.setAccessible(true) method.invoke(loader, url) } catch { case t: Throwable = { throw new IOException(Error, could not add URL to system classloader) } } } Sincerely, DB

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The reflection actually works. But you need to get the loader by `val loader = Thread.currentThread.getContextClassLoader` which is set by Spark executor. Our team verified this, and uses it as workaround. Sincerely, DB Tsai --- My Blog

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The jars are included in my driver, and I can successfully use them in the driver. I'm working on a patch, and it's almost working. Will submit a PR soon. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https

Fwd: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
the customClassloader to create a wrapped class, and in this wrapped class, the classloader is inherited from the customClassloader so that users don't need to do reflection in the wrapped class. I'm working on this now. Sincerely, DB Tsai --- My Blog: https

Calling external classes added by sc.addJar needs to be through reflection

2014-05-16 Thread DB Tsai
will not be seen. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Multinomial Logistic Regression

2014-05-13 Thread DB Tsai
_ = math.log(numerators(math.round(y - 1).toInt) / denominator) } (loglike, predicted) } Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, May 13, 2014 at 4:08 AM, Debasish Das

Re: mllib vector templates

2014-05-05 Thread DB Tsai
+1 Would be nice that we can use different type in Vector. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:41 PM, Debasish Das debasish.da...@gmail.comwrote: Hi

Re: mllib vector templates

2014-05-05 Thread DB Tsai
Breeze could take any type (Int, Long, Double, and Float) in the matrix template. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da

Re: reduce, transform, combine

2014-05-04 Thread DB Tsai
You could easily achieve this by mapPartition. However, it seems that it can not be done by using aggregate type of operation. I can see that it's a general useful operation. For now, you could use mapPartition. Sincerely, DB Tsai --- My Blog

Code Review for SPARK-1516: Throw exception in yarn client instead of System.exit

2014-04-29 Thread DB Tsai
straightforward, we wonder if this can be reviewed and have this in 1.0. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-29 Thread DB Tsai
in Spark before we've deeper understanding of how stochastic LBFGS works. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 29, 2014 at 9:50 PM, David Hall d...@cs.berkeley.edu wrote

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-28 Thread DB Tsai
Also, how many failure of rejection will terminate the optimization process? How is it related to numberOfImprovementFailures? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-28 Thread DB Tsai
, miniBatchFraction, lbfgs, miniBatchSize) val states = lbfgs.iterations(new CachedDiffFunction(costFun), initialWeights.toBreeze.toDenseVector) Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai
. The difference you saw is actually from dense feature or sparse feature vector. For LBFGS and GD dense feature, you can see the first iteration takes the same time. It's true for GD. I'm going to run rcv1.binary which only has 0.15% non-zero elements to verify the hypothesis. Sincerely, DB Tsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai
, Vectors.fromBreeze(gradientSum / miniBatchSize), stepSize, i, regParam) weights = update._1 regVal = update._2 timeStamp.append(System.nanoTime() - startTime) } Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai
rcv1.binary is too sparse (0.15% non-zero elements), so dense format will not run due to out of memory. But sparse format runs really well. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Jekyll documentation generation error

2014-04-23 Thread DB Tsai
/dbtsai/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/commander-4.1.6/lib/commander/import.rb:10:in `block in top (required)' Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 22

Jekyll documentation generation error

2014-04-23 Thread DB Tsai
But what doesSKIP_SCALADOC=1 mean? export SKIP_SCALADOC=1? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Jekyll documentation generation error

2014-04-23 Thread DB Tsai
Matei, thanks. It works with kramdown. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 22, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Try doing “gem install

MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
result. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
a 1% (or so) sparse dataset to experiment with? On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote: Hi all, I'm benchmarking Logistic Regression in MLlib using the newly added optimizer LBFGS and GD. I'm using the same dataset and the same methodology in this paper, http

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
The figure showing the Log-Likelihood vs Time can be found here. https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf Let me know if you can not open it. Sincerely, DB Tsai --- My Blog

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
ps, it doesn't make sense to have weight and gradient sparse unless with strong L1 penalty. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai dbt

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
Not yet since it's running in the cluster. Will run locally with profiler. Thanks for help. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:22 PM, David Hall d

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
The figure showing the Log-Likelihood vs Time can be found here. https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf Let me know if you can not open it. Thanks. Sincerely, DB Tsai

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread DB Tsai
stepSize: Double, var numIterations: Int, var regParam: Double, var miniBatchFraction: Double Xiangrui, what do you think? For now, you can use my L-BFGS solver by copying and pasting the LogisticRegressionWithSGD code, and changing the optimizer to L-BFGS. Sincerely, DB Tsai

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread DB Tsai
computation per RDD is done on each of the workers... This miniBatchFraction is also a heuristic which I don't think makes sense for LogisticRegressionWithBFGS...does it ? On Tue, Apr 8, 2014 at 3:44 PM, DB Tsai dbt...@stanford.edu wrote: Hi Debasish, The L-BFGS solver will be in the master like

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread DB Tsai
I don't experiment it. That's the use-case in theory I could think of. ^^ However, from what I saw, BFGS converges really fast so that I only need 20~30 iterations in general. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-07 Thread DB Tsai
Hi guys, The latest PR uses Breeze's L-BFGS implement which is introduced by Xiangrui's sparse input format work in SPARK-1212. https://github.com/apache/spark/pull/353 Now, it works with the new sparse framework! Any feedback would be greatly appreciated. Thanks. Sincerely, DB Tsai

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-07 Thread DB Tsai
Hi Xiangrui, I think it doesn't matter whether we use Fortran/Breeze/RISO for optimizers since optimization only takes 1% of time. Most of the time is in gradientSum and lossSum parallel computation. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-06 Thread DB Tsai
Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502 Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Wed, Mar 5, 2014 at 2:00 PM, David Hall d...@cs.berkeley.edu wrote: On Wed, Mar 5, 2014 at 1:57 PM

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread DB Tsai
for you to investigate the issue? Or do I need to make it as a standalone test? Will send you the test later today. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-04 Thread DB Tsai
, let's have a design discussion around this. It may be more effective since we can design a architecture that have to work for both cases in the codebase, and will be easier to think about the edge case for it. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread DB Tsai
this? Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Sun, Mar 2, 2014 at 10:23 AM, Debasish Das debasish.da...@gmail.com wrote: Hi DB, 1. Could you point to the BFGS repositories used to publish artifacts

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread DB Tsai
. Is this getting merged to the master or there will be revisions on it ? https://github.com/apache/spark/pull/53 Thanks. Deb Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread DB Tsai
to fix the L-BFGS in breeze, and we can get OWL-QN and L-BFGS-B. What do you think? Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Mon, Mar 3, 2014 at 3:52 PM, DB Tsai dbt...@alpinenow.com wrote: Hi Deb

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-02 Thread DB Tsai
. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/

<    1   2