Hi,
What is the purpose of the taskBinary for a ShuffleMapTask? What does it
contain and how is it useful? Is it the representation of all the RDD
operations that will be applied for the partition that task will be
processing? (in the case below the task will process stage 0, partition 0)
If it
I'm running Spark in local mode and getting these two log messages who
appear to be similar. I want to understand what each is doing:
1. [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started
service 'sparkDriver' on port 60782.
2. [main] executor.Executor
I'm trying to build Spark using Intellij on Windows. But I'm repeatedly
getting this error
spark-master\external\flume-sink\src\main\scala\org\apache\spark\streaming\flume\sink\SparkAvroCallbackHandler.scala
Error:(46, 66) not found: type SparkFlumeProtocol
val transactionTimeout: Int, val
Consider the classic word count application over a 4 node cluster with a
sizable working data. What makes Spark ran faster than MapReduce
considering that Spark also has to write to disk during shuffle?
Spark is an in-memory engine and attempts to do computation in-memory.
Tachyon is memory-centeric distributed storage, OK, but how would that help
ran Spark faster?
Thanks!
On Wed, Aug 5, 2015 at 5:24 PM, Saisai Shao sai.sai.s...@gmail.com wrote:
Yes, finally shuffle data will be written to disk for reduce stage to
pull, no matter how large you set to shuffle memory fraction.
Thanks
Saisai
On Thu, Aug 6, 2015 at 7:50 AM, Muler mulugeta.abe
Hi,
Consider I'm running WordCount with 100m of data on 4 node cluster.
Assuming my RAM size on each node is 200g and i'm giving my executors 100g
(just enough memory for 100m data)
1. If I have enough memory, can Spark 100% avoid writing to disk?
2. During shuffle, where results have to
sai.sai.s...@gmail.com wrote:
Hi Muler,
Shuffle data will be written to disk, no matter how large memory you have,
large memory could alleviate shuffle spill where temporary file will be
generated if memory is not enough.
Yes, each node writes shuffle data to file and pulled from disk in reduce