spark fault tolerance mechanism

2015-01-15 Thread YANG Fan
Hi, I'm quite interested in how Spark's fault tolerance works and I'd like to ask a question here. According to the paper, there are two kinds of dependencies--the wide dependency and the narrow dependency. My understanding is, if the operations I use are all narrow, then when one machine

dealing with large values in kv pairs

2014-11-10 Thread YANG Fan
Hi, I've got a huge list of key-value pairs, where the key is an integer and the value is a long string(around 1Kb). I want to concatenate the strings with the same keys. Initially I did something like: pairs.reduceByKey((a, b) = a+ +b) Then tried to save the result to HDFS. But it was