hi Daniel,
Do you solve your problem?
I met the same problem when running massive data using reduceByKey on yarn.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p25023.html
Sent from
.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I have had a lot of success with Spark on large datasets,
both in terms of performance and flexibility.
However I hit a wall with reduceByKey when the RDD contains billions of
items.
I am reducing with simple functions like addition for building histograms,
so the reduction process should be
://www.linkedin.com/in/msiddalingaiah
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.