Hi, I find that loading files from HDFS can incur huge amount of network
traffic. Input size is 90G and network traffic is about 80G. By my
understanding, local files should be read and thus no network communication
is needed.

I use Spark 1.5.1, and the following is my code:

val textRDD = sc.textFile("hdfs://master:9000/inputDir")
textRDD.count

Jeffrey

Reply via email to