How about increasing RDD's partitions / rebalancing data? On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud <mpars...@illumina.com> wrote:
> How to improve performance of JavaRDD<String>.saveAsTextFile(“hdfs://…“). > This is taking over 30 minutes on a cluster of 10 nodes. > Running Spark on YARN. > > JavaRDD<String> has 120 million entries. > > Thank you, > Best regards, > Mahmoud >