Hi! I would like to know what is the difference between the following transformations when they are executed right before writing RDD to a file?
1. coalesce(1, shuffle = true) 2. coalesce(1, shuffle = false) Code example: val input = sc.textFile(inputFile) val filtered = input.filter(doSomeFiltering) val mapped = filtered.map(doSomeMapping) mapped.coalesce(1, shuffle = true).saveAsTextFile(outputFile) vs mapped.coalesce(1, shuffle = false).saveAsTextFile(outputFile) And how does it compare with collect()? I'm fully aware that Spark save methods will store it with HDFS-style structure, however I'm more interested in data partitioning aspects of collect() and shuffled/non-shuffled coalesce(). Thanks, Paweł. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffled-vs-non-shuffled-coalesce-in-Apache-Spark-tp23377.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org