Re: Questions about the files that Spark will produce during its running

2013-10-29 Thread Matei Zaharia
The error is from a worker node -- did you check that /data2 is set up properly on the worker nodes too? In general that should be the only directory used. Matei On Oct 28, 2013, at 6:52 PM, Shangyu Luo lsy...@gmail.com wrote: Hello, I have some questions about the files that Spark will

Re: Questions about the files that Spark will produce during its running

2013-10-29 Thread Shangyu Luo
Yes, I broadcast the spark-env.sh file to all worker nodes before I run my program and then execute bin/stop-all.sh, bin/start-all.sh. I have also viewed the size of data2 directory on each worker node and it is also about 800G. Thanks! 2013/10/29 Matei Zaharia matei.zaha...@gmail.com The

Re: Task output before a shuffle

2013-10-29 Thread Ufuk Celebi
On 29 Oct 2013, at 02:47, Matei Zaharia matei.zaha...@gmail.com wrote: Yes, we still write out data after these tasks in Spark 0.8, and it needs to be written out before any stage that reads it can start. The main reason is simplicity when there are faults, as well as more flexible scheduling

Re: compare/contrast Spark with Cascading

2013-10-29 Thread Koert Kuipers
Hey Prashant, I assume you mean steps to reproduce the OOM. I do not currently. I just ran into them when porting some jobs from map-red. I never turned it into a reproducible test, and i do not exclude that it was my poor programming that caused it. However it happened with a bunch of jobs, and

Re: met a problem while running a streaming example program

2013-10-29 Thread Patrick Wendell
If you just add the extends Serializable changes from here it should work. On Tue, Oct 29, 2013 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote: This was fixed on 0.8 branch and master: https://github.com/apache/incubator-spark/pull/63/files - Patrick On Tue, Oct 29, 2013 at 9:17 AM,

Re: spark-0.8.0 and hadoop-2.1.0-beta

2013-10-29 Thread Matei Zaharia
I’m curious, Viren, do you have a patch you could post to build this against YARN 2.1 / 2.2? It would be nice to see how big the changes are. Matei On Sep 30, 2013, at 10:14 AM, viren kumar vire...@gmail.com wrote: I was able to get Spark 0.8.0 to compile with Hadoop/Yarn 2.1.0-beta, by

executor failures w/ scala 2.10

2013-10-29 Thread Imran Rashid
We've been testing out the 2.10 branch of spark, and we're running into some issues were akka disconnects from the executors after a while. We ran some simple tests first, and all was well, so we started upgrading our whole codebase to 2.10. Everything seemed to be working, but then we noticed

Re: met a problem while running a streaming example program

2013-10-29 Thread dachuan
yes, it works after checkout branch-0.8. thanks. On Tue, Oct 29, 2013 at 12:51 PM, Patrick Wendell pwend...@gmail.comwrote: If you just add the extends Serializable changes from here it should work. On Tue, Oct 29, 2013 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote: This was

Re: Getting exception org.apache.spark.SparkException: Job aborted: Task 1.0:37 failed more than 4 times

2013-10-29 Thread Sergey Soldatov
You may check 'out' files in logs directory for the failure details. On Wed, Oct 30, 2013 at 12:17 AM, Soumya Simanta soumya.sima...@gmail.comwrote: I'm using a pretty recent version of Spark ( 0.8) from Github and it's failing with the following exception for a very simple task on the

Re: Spark cluster memory configuration for spark-shell

2013-10-29 Thread Aaron Davidson
You are correct. If you are just using spark-shell in local mode (i.e., without cluster), you can set the SPARK_MEM environment variable to give the driver more memory. E.g.: SPARK_MEM=24g ./spark-shell Otherwise, if you're using a real cluster, the driver shouldn't require a significant amount

RE: spark-0.8.0 and hadoop-2.1.0-beta

2013-10-29 Thread Liu, Raymond
I am also working on porting the trunk code onto 2.2.0. Seems quite many API changes but many of them are just a rename work. While Yarn 2.1.0 beta also add some client API for easy interaction with YARN framework, but there are not many examples on how to use them ( API and wiki doc are both