Re: Jekyll documentation generation error

2014-04-23 Thread DB Tsai
This is the trace. Conversion error: There was an error converting 'docs/cluster-overview.md '. /Users/dbtsai/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/jekyll-1.5.1/lib/jekyll/converters/markdown/maruku_parser.rb:45:in `print_errors_and_fail': MaRuKu encountered problem(s) while

Jekyll documentation generation error

2014-04-23 Thread DB Tsai
Hi guys, I'm trying to update LBFGS documentation so I need to generate html document to see if everything looks great. However, mv I get the following error. Conversion error: There was an error converting 'docs/cluster-overview.md'. error: MaRuKu encountered problem(s) while converting your

Re: Jekyll documentation generation error

2014-04-23 Thread Matei Zaharia
Try doing “gem install kramdown”. The maruku gem for Markdown throws these errors, but Kramdown doesn’t. Matei On Apr 22, 2014, at 11:31 PM, DB Tsai dbt...@dbtsai.com wrote: This is the trace. Conversion error: There was an error converting 'docs/cluster-overview.md '.

Re: Jekyll documentation generation error

2014-04-23 Thread DB Tsai
Matei, thanks. It works with kramdown. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 22, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Try doing “gem install

Sharing RDDs

2014-04-23 Thread Saumitra Shahapure (Vizury)
Hello, Is it possible in spark to reuse cached RDDs generated in earlier run? Specifically, I am trying to have a setup where first scala script generates cached RDDs. If another scala script tries to perform same operations on same dataset, it should be able to get results from cache generated

Re: Spark on wikipedia dataset

2014-04-23 Thread Mayur Rustagi
Huge joins would be interesting. I do all my demos on wikipedia dataset for Shark. Joins are typical pain to showcase show off :) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, Apr 23, 2014 at 10:33 AM, Ajay Nair

get -101 error code when running select query

2014-04-23 Thread qingyang li
hi, i have started one sharkserver2 , and using java code to send query to this server by hive jdbc, but i got such error: -- FAILED: Execution Error, return code -101 from shark.execution.SparkTask org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED:

Re: ArrayIndexOutOfBoundsException in ALS.implicit

2014-04-23 Thread Xiangrui Meng
Hi bearrito, this issue was fixed by Tor in https://github.com/apache/spark/pull/407. You can either try the master branch or wait for the 1.0 release. -Xiangrui On Fri, Mar 28, 2014 at 12:19 AM, Xiangrui Meng men...@gmail.com wrote: Hi bearrito, This is a known issue

Re: get -101 error code when running select query

2014-04-23 Thread Madhu
I have seen a similar error message when connecting to Hive through JDBC. This is just a guess on my part, but check your query. The error occurs if you have a select that includes a null literal with an alias like this: select a, b, null as c, d from foo In my case, rewriting the query to use

Re: [jira] [Commented] (SPARK-1576) Passing of JAVA_OPTS to YARN on command line

2014-04-23 Thread Nishkam Ravi
It would probably be best to retain support for SPARK_JAVA_OPTS in ClientBase though..for developers that may have been using it. On Wed, Apr 23, 2014 at 6:26 PM, Nishkam Ravi nr...@cloudera.com wrote: Bit of a race condition here it seems. Patrick made a few changes yesterday around the same

MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
Hi all, I'm benchmarking Logistic Regression in MLlib using the newly added optimizer LBFGS and GD. I'm using the same dataset and the same methodology in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf I want to know how Spark scale while adding workers, and how optimizers and input

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread Evan Sparks
What is the number of non zeroes per row (and number of features) in the sparse case? We've hit some issues with breeze sparse support in the past but for sufficiently sparse data it's still pretty good. On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote: Hi all, I'm

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread Evan Sparks
Sorry - just saw the 11% number. That is around the spot where dense data is usually faster (blocking, cache coherence, etc) is there any chance you have a 1% (or so) sparse dataset to experiment with? On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote: Hi all, I'm

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread Shivaram Venkataraman
I don't think the attachment came through in the list. Could you upload the results somewhere and link to them ? On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai dbt...@dbtsai.com wrote: 123 features per rows, and in average, 89% are zeros. On Apr 23, 2014 9:31 PM, Evan Sparks evan.spa...@gmail.com

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
Any suggestion for sparser dataset? Will test more tomorrow in the office. On Apr 23, 2014 9:33 PM, Evan Sparks evan.spa...@gmail.com wrote: Sorry - just saw the 11% number. That is around the spot where dense data is usually faster (blocking, cache coherence, etc) is there any chance you have

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
On Wed, Apr 23, 2014 at 9:30 PM, Evan Sparks evan.spa...@gmail.com wrote: What is the number of non zeroes per row (and number of features) in the sparse case? We've hit some issues with breeze sparse support in the past but for sufficiently sparse data it's still pretty good. Any chance you

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
The figure showing the Log-Likelihood vs Time can be found here. https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf Let me know if you can not open it. Sincerely, DB Tsai --- My

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
Was the weight vector sparse? The gradients? Or just the feature vectors? On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai dbt...@dbtsai.com wrote: The figure showing the Log-Likelihood vs Time can be found here.

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
ps, it doesn't make sense to have weight and gradient sparse unless with strong L1 penalty. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
On Wed, Apr 23, 2014 at 10:18 PM, DB Tsai dbt...@dbtsai.com wrote: ps, it doesn't make sense to have weight and gradient sparse unless with strong L1 penalty. Sure, I was just checking the obvious things. Have you run it through it a profiler to see where the problem is? Sincerely, DB

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
Not yet since it's running in the cluster. Will run locally with profiler. Thanks for help. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:22 PM, David Hall

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
The figure showing the Log-Likelihood vs Time can be found here. https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf Let me know if you can not open it. Thanks. Sincerely, DB Tsai ---

Re: get -101 error code when running select query

2014-04-23 Thread qingyang li
thanks for sharing, my case is diffrent from yours, i have set hive.server2.enable.doAs into false in hive-site.xml, then that 101 error code disappeared. 2014-04-24 9:26 GMT+08:00 Madhu ma...@madhu.com: I have seen a similar error message when connecting to Hive through JDBC. This is