Hi, Cross-posting this from users list. I'm running on branch-1.1 and trying to do a simple transformation to a relatively small dataset of 64GB and saveAsTextFile essentially hangs and tasks are stuck in running mode with the following code:
// Stalls with tasks running for over an hour with no tasks finishing. Smallest partition is 10MB val data = sc.textFile("s3n://input") val reformatted = data.map(t => t.replace("Test(","").replace(")","").replaceAll(",","\t")) reformatted.saveAsTextFile("s3n://transformed") // This runs but stalls doing GC after filling up 150% of 650GB of memory val data = sc.textFile("s3n://input") val reformatted = data.map(t => t.replace("Test(","").replace(")","").replaceAll(",","\t")).cache reformatted.saveAsTextFile("s3n://transformed") Any idea if this is a parameter issue and there is something I should try out? Thanks! - jerry -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-makes-no-progress-without-caching-RDD-tp7949.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org