subject:"Configuring Spark for reduceByKey on on massive data sets"

Re: Configuring Spark for reduceByKey on on massive data sets

2015-10-12 Thread hotdog

hi Daniel, Do you solve your problem? I met the same problem when running massive data using reduceByKey on yarn. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p25023.html Sent from

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-18 Thread lukas nalezenec

.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Configuring Spark for reduceByKey on on massive data sets

2014-05-17 Thread Daniel Mahler

I have had a lot of success with Spark on large datasets, both in terms of performance and flexibility. However I hit a wall with reduceByKey when the RDD contains billions of items. I am reducing with simple functions like addition for building histograms, so the reduction process should be

Re: Configuring Spark for reduceByKey on on massive data sets

2014-05-17 Thread Madhu

://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-Spark-for-reduceByKey-on-on-massive-data-sets-tp5966p5967.html Sent from the Apache Spark User List mailing list archive at Nabble.com.