Dear all,
As a Spark newbie, I need some help to understand how RDD save to file
behaves. After reading the post on saving single files efficiently
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-as-a-single-file-efficiently-td3014.html
I understand that each partition of the
Large memory is need to build spark, I think you should make xmx larger, 2g for
example.
-原始邮件-
发件人: Bharath Bhushan manku.ti...@outlook.com
发送时间: 2014/3/22 12:50
收件人: user@spark.apache.org user@spark.apache.org
主题: unable to build spark - sbt/sbt: line 50: killed
I am getting the
Hi,
Each of my worker node has its own unique spark.local.dir.
However, when I run spark-shell, the shuffle writes are always written to /tmp
despite being set when the worker node is started.
By specifying the spark.local.dir for the driver program, it seems to override
the executor? Is
Yes it works with smaller file, it can count and map, but not distinct.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/distinct-on-huge-dataset-tp3025p3033.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for the reply. It turns out that my ubuntu-in-vagrant had 512MB of ram
only. Increasing it to 1024MB allowed the assembly to finish successfully. Peak
usage was around 780MB.
To: user@spark.apache.org
From: vboylin1...@gmail.com
Subject: 答复: unable to build spark - sbt/sbt: line 50:
i have found that i am unable to build/test spark with sbt and java6, but
using java7 works (and it compiles with java target version 1.6 so binaries
are usable from java 6)
On Sat, Mar 22, 2014 at 3:11 PM, Bharath Bhushan manku.ti...@outlook.comwrote:
Thanks for the reply. It turns out that
This could be related to the hash collision bug in ExternalAppendOnlyMap in
0.9.0: https://spark-project.atlassian.net/browse/SPARK-1045
You might try setting spark.shuffle.spill to false and see if that runs any
longer (turning off shuffle spill is dangerous, though, as it may cause
Spark to OOM
Yes, that helped, at least it was able to advance a bit further.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/distinct-on-huge-dataset-tp3025p3038.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
But i was wrong - map also fails on big file and setting spark.shuffle.spill
doesn't help. Map fails with the same error.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/distinct-on-huge-dataset-tp3025p3039.html
Sent from the Apache Spark User List mailing
I have this problem too. Eventually the job fails (on the UI) and hangs the
terminal until I CTRL + C. (Logs below)
Now the Spark docs explain the heartbeat configuration stuff can be tweaked
to handle GC hangs. I'm wondering if this is symptomatic of pushing the
cluster a little too hard (we
I am having similar issues with much smaller data sets. I am using spark
EC2 scripts to launch clusters, but I almost always end up with straggling
executors that take over a node's CPU and memory and end up never finishing.
On Thu, Mar 20, 2014 at 1:54 PM, Soila Pertet Kavulya
Were you able to figure out what this was?
You can try setting spark.akka.askTimeout to a larger value. That might
help.
On Thu, Mar 20, 2014 at 10:24 PM, mohit.goyal mohit.go...@guavus.comwrote:
Hi,
I have run the spark application to process input data of size ~14GB with
executor memory
FWIW I've seen correctness errors with spark.shuffle.spill on 0.9.0 and
have it disabled now. The specific error behavior was that a join would
consistently return one count of rows with spill enabled and another count
with it disabled.
Sent from my mobile phone
On Mar 22, 2014 1:52 PM, Kane
13 matches
Mail list logo