It is the same file and hadoop library that we use for splitting takes care of assigning the right split to each node.
Prashant Sharma On Thu, Apr 24, 2014 at 1:36 PM, Carter <gyz...@hotmail.com> wrote: > Thank you very much for your help Prashant. > > Sorry I still have another question about your answer: "however if the > file("/home/scalatest.txt") is present on the same path on all systems it > will be processed on all nodes." > > When presenting the file to the same path on all nodes, do we just simply > copy the same file to all nodes, or do we need to split the original file > into different parts (each part is still with the same file name > "scalatest.txt"), and copy each part to a different node for > parallelization? > > Thank you very much. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638p4738.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >