Dear, all definition of fetch wait time: * Time the task spent waiting for remote shuffle blocks. This only includes the time * blocking on shuffle input data. For instance if block B is being fetched while the task is * still not finished processing block A, it is not considered to be blocking on block B.
by the definition of fetch wait time, can i make a conclusion that tasks pipeline block fetch and the real work? how spark decides the task can be splitted by blocks to do the pipeline? if the task is something like: val b = a.mapPartitions{ itr => timeStamp val arr = itr.toArray ... timeStamp arr.toIterator } can fetching blocks of RDD a and processing RDD b be pipelined? here's the information of my task: "Launch Time":1399882225433 "Finish Time": 1399882252948 "Executor Run Time":27497 "Shuffle Finish Time":1399882246138 "Fetch Wait Time":9377 task time in a.mapPartitions is 8287 (say it mapPartition time) Finish Time - Launch Time = 27515 Shuffle Finish Time - Launch Time = 20705 (say it total shuffle time) Executor Run Time - total shuffle time = 6792 total shuffle time = 20705, and Fetch Wait Time = 9377, so in the time of (20705-9377=11328), the task are doing other jobs, what does it do? the mapPartition? or the mapPartition is executed after shuffle completes? but the times calculated do not match. i'am so confused, need your help! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/something-about-pipeline-tp5626.html Sent from the Apache Spark User List mailing list archive at Nabble.com.