We have a large file and we used to read chunks and then use parallelize method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk by chunk. Recently we read the whole file using textFile method and found the map/reduce job is much faster. Anybody can help us to understand why? We have verified that reading file is NOT a bottleneck.
-- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org