Dear, all

   definition of fetch wait time:
   * Time the task spent waiting for remote shuffle blocks. This only
includes the time
   * blocking on shuffle input data. For instance if block B is being
fetched while the task is
   * still not finished processing block A, it is not considered to be
blocking on block B.

   by the definition of fetch wait time, can i make a conclusion that tasks
pipeline block fetch and the
   real work? how spark decides the task can be splitted by blocks to do the
pipeline?

  if the task is something like:

  val b = a.mapPartitions{ itr =>
    timeStamp
    val arr = itr.toArray
    ...
    timeStamp
    arr.toIterator
  }

  can fetching blocks of RDD a and processing RDD b be pipelined?

here's the information of my task:
"Launch Time":1399882225433
"Finish Time":  1399882252948
"Executor Run Time":27497
"Shuffle Finish Time":1399882246138
"Fetch Wait Time":9377
task time in a.mapPartitions is 8287 (say it mapPartition time)

Finish Time - Launch Time = 27515
Shuffle Finish Time - Launch Time = 20705 (say it total shuffle time)
Executor Run Time - total shuffle time = 6792

total shuffle time = 20705, and Fetch Wait Time = 9377, so in the time of
(20705-9377=11328), the
task are doing other jobs, what does it do? the mapPartition? or the
mapPartition is executed after 
shuffle completes? but the times calculated do not match. i'am so confused,
need your help!







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/something-about-pipeline-tp5626.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to