Adding a call to rdd.repartition() after randomizing the keys has no effect either. code -
//partitioning is done like partitionIdx = f(key) % numPartitions //we use random keys to get even partitioning val uniform = other_stream.transform(rdd => { rdd.map({ kv => val k = kv._1 val v = kv._2 (UUID.randomUUID().toString, v) }) rdd.repartition(20) }) uniform.foreachRDD(rdd => { rdd.forEachPartition(partition => { -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Imbalanced-shuffle-read-tp18648p18791.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org