Re: Running Spark on a single machine

2014-03-17 Thread goi cto
Sorry, I did not explain myself correctly. I know how to run spark, the question is how to instruct spark to do all of the computation on a single machine? I was trying to convert the code to scala but I miss some of the methods of spark like reduceByKey Eran On Mon, Mar 17, 2014 at 7:25 AM,

Re: combining operations elegantly

2014-03-17 Thread Richard Siebeling
Patrick, Koert, I'm also very interested in these examples, could you please post them if you find them? thanks in advance, Richard On Thu, Mar 13, 2014 at 9:39 PM, Koert Kuipers ko...@tresata.com wrote: not that long ago there was a nice example on here about how to combine multiple

Spark shell exits after 1 min

2014-03-17 Thread Sai Prasanna
Hi everyone !! I installed scala 2.9.3, spark 0.8.1, oracle java 7... I launched master and logged on to the interactive spark shell MASTER=spark://localhost:7077 ./spark-shell But after one minute, automatically it exits from the interactive shell... Is there something i am missing...Do i

Re: Spark shell exits after 1 min

2014-03-17 Thread Sai Prasanna
Solved...but dont know whats the difference... just giving ./spark-shell fixes it all...but dont know why !! On Mon, Mar 17, 2014 at 1:32 PM, Sai Prasanna ansaiprasa...@gmail.comwrote: Hi everyone !! I installed scala 2.9.3, spark 0.8.1, oracle java 7... I launched master and logged on to

Re: possible bug in Spark's ALS implementation...

2014-03-17 Thread Xiangrui Meng
The factor matrix Y is used twice in implicit ALS computation, one to compute global Y^T Y, and another to compute local Y_i^T C_i Y_i. -Xiangrui On Sun, Mar 16, 2014 at 1:18 PM, Matei Zaharia matei.zaha...@gmail.com wrote: On Mar 14, 2014, at 5:52 PM, Michael Allman m...@allman.ms wrote: I

Re: Running spark examples

2014-03-17 Thread Chengi Liu
Hi, Thanks for the quick response.. Is there a simple way to write and deploy apps on spark. import org.apache.spark.SparkContext; import org.apache.spark.SparkContext._; object HelloWorld { def main(args: Array[String]) { println(Hello, world!) val sc = new

Re: example of non-line oriented input data?

2014-03-17 Thread Matei Zaharia
Hi Diana, Non-text input formats are only supported in Java and Scala right now, where you can use sparkContext.hadoopFile or .hadoopDataset to load data with any InputFormat that Hadoop MapReduce supports. In Python, you unfortunately only have textFile, which gives you one record per line.

Re: example of non-line oriented input data?

2014-03-17 Thread Matei Zaharia
Here’s an example of getting together all lines in a file as one string: $ cat dir/a.txt Hello world! $ cat dir/b.txt What's up?? $ bin/pyspark files = sc.textFile(“dir”) files.collect() [u'Hello', u'world!', uWhat's, u'up??’] # one element per line, not what we want

Re: example of non-line oriented input data?

2014-03-17 Thread Diana Carroll
There's also mapPartitions, which gives you an iterator for each partition instead of an array. You can then return an iterator or list of objects to produce from that. I confess, I was hoping for an example of just that, because i've not yet been able to figure out how to use mapPartitions. No

is collect exactly-once?

2014-03-17 Thread Adrian Mocanu
Hi Quick question here, I know that .foreach is not idempotent. I am wondering if collect() is idempotent? Meaning that once I've collect()-ed if spark node crashes I can't get the same values from the stream ever again. Thanks -Adrian

Re: sbt assembly fails

2014-03-17 Thread Chengi Liu
I have set it up.. still it fails.. Question: https://oss.sonatype.org/content/repositories/snapshots/io/netty/netty-all/https://oss.sonatype.org/content/repositories/snapshots/io/netty/netty-all/4.0.13.Final/netty-all-4.0.13.Final.pom 4.0.13 is not there? Instead 4.0.18 is there?? Is this a bug?

Re: is collect exactly-once?

2014-03-17 Thread Matei Zaharia
Yup, it only returns each value once. Matei On Mar 17, 2014, at 1:14 PM, Adrian Mocanu amoc...@verticalscope.com wrote: Hi Quick question here, I know that .foreach is not idempotent. I am wondering if collect() is idempotent? Meaning that once I’ve collect()-ed if spark node crashes I

Re: example of non-line oriented input data?

2014-03-17 Thread Matei Zaharia
Oh, I see, the problem is that the function you pass to mapPartitions must itself return an iterator or a collection. This is used so that you can return multiple output records for each input record. You can implement most of the existing map-like operations in Spark, such as map, filter,

inexplicable exceptions in Spark 0.7.3

2014-03-17 Thread Walrus theCat
Hi, I'm getting this stack trace, using Spark 0.7.3. No references to anything in my code, never experienced anything like this before. Any ideas what is going on? java.lang.ClassCastException: spark.SparkContext$$anonfun$9 cannot be cast to scala.Function2 at

Re: java.lang.NullPointerException met when computing new RDD or use .count

2014-03-17 Thread Ian O'Connell
I'm guessing the other result was wrong, or just never evaluated here. The RDD transforms being lazy may have let it be expressed, but it wouldn't work. Nested RDD's are not supported. On Mon, Mar 17, 2014 at 4:01 PM, anny9699 anny9...@gmail.com wrote: Hi Andrew, Thanks for the reply.

Re: links for the old versions are broken

2014-03-17 Thread Matei Zaharia
Thanks for reporting this, looking into it. On Mar 17, 2014, at 2:44 PM, Walrus theCat walrusthe...@gmail.com wrote: ping On Thu, Mar 13, 2014 at 11:05 AM, Aaron Davidson ilike...@gmail.com wrote: Looks like everything from 0.8.0 and before errors similarly (though Spark 0.3 for Scala

Trouble getting hadoop and spark run along side on my vm

2014-03-17 Thread Shivani Rao
From what i understand getting Spark to run alongside a hadoop cluster requires the following a) a working hadoop b) a compiled Spark c) configuration parameters that point spark to the right hadoop conf files i ) Can you let me know the specific steps to take after spark was compiled (via sbt

Re: Problem when execute spark-shell

2014-03-17 Thread Yexi Jiang
Thanks all! I figured it out... I thought sbt package is enough... 2014-03-17 21:46 GMT-04:00 Debasish Das debasish.da...@gmail.com: You need the spark assembly jar to run spark shellPlease do sbt assembly to generate the jar On Mar 17, 2014 2:11 PM, Yexi Jiang

Apache Spark 0.9.0 Build Error

2014-03-17 Thread wapisani
Good morning! I'm attempting to build Apache Spark 0.9.0 on Windows 8. I've installed all prerequisites (except Hadoop) and run sbt/sbt assembly while in the root directory. I'm getting an error after the line Set current project to root in build file:C:/.../spark-0.9.0-incubating/. The error is:

Re: possible bug in Spark's ALS implementation...

2014-03-17 Thread Xiangrui Meng
Hi Michael, I made couple changes to implicit ALS. One gives faster construction of YtY (https://github.com/apache/spark/pull/161), which was merged into master. The other caches intermediate matrix factors properly (https://github.com/apache/spark/pull/165). They should give you the same result