parallelize method v.s. textFile method

xing Wed, 24 Jun 2015 18:00:12 -0700

We have a large file and we used to read chunks and then use parallelize
method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk
by chunk. Recently we read the whole file using textFile method and found
the map/reduce job is much faster. Anybody can help us to understand why? We
have verified that reading file is NOT a bottleneck.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

parallelize method v.s. textFile method

Reply via email to