Re: Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Sean Owen
-dev +user How are you measuring network traffic? It's not in general true that there will be zero network traffic, since not all executors are local to all data. That can be the situation in many cases but not always. On Mon, Oct 26, 2015 at 8:57 AM, Jinfeng Li wrote: > Hi,

Loading Files from HDFS Incurs Network Communication

2015-10-26 Thread Jinfeng Li
Hi, I find that loading files from HDFS can incur huge amount of network traffic. Input size is 90G and network traffic is about 80G. By my understanding, local files should be read and thus no network communication is needed. I use Spark 1.5.1, and the following is my code: val textRDD =