Hi,
I'm quite interested in how Spark's fault tolerance works and I'd like to
ask a question here.
According to the paper, there are two kinds of dependencies--the wide
dependency and the narrow dependency. My understanding is, if the
operations I use are all narrow, then when one machine
Hi,
I've got a huge list of key-value pairs, where the key is an integer and
the value is a long string(around 1Kb). I want to concatenate the strings
with the same keys.
Initially I did something like: pairs.reduceByKey((a, b) = a+ +b)
Then tried to save the result to HDFS. But it was