[jira] [Created] (SPARK-23040) BlockStoreShuffleReader's return Iterator isn't interruptible if aggregator or ordering is specified

Xianjin YE (JIRA) Thu, 11 Jan 2018 01:30:50 -0800

Xianjin YE created SPARK-23040:
----------------------------------

             Summary: BlockStoreShuffleReader's return Iterator isn't 
interruptible if aggregator or ordering is specified
                 Key: SPARK-23040
                 URL: https://issues.apache.org/jira/browse/SPARK-23040
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.2.1, 2.2.0, 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0
            Reporter: Xianjin YE
            Priority: Minor



For example, if ordering is specified, the returned iterator is an 
CompletionIterator
{code:java}
    dep.keyOrdering match {
      case Some(keyOrd: Ordering[K]) =>
        // Create an ExternalSorter to sort the data.
        val sorter =
          new ExternalSorter[K, C, C](context, ordering = Some(keyOrd), 
serializer = dep.serializer)
        sorter.insertAll(aggregatedIter)
        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
        context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
        CompletionIterator[Product2[K, C], Iterator[Product2[K, 
C]]](sorter.iterator, sorter.stop())
      case None =>
        aggregatedIter
    }
{code}

However the sorter would consume(in sorter.insertAll) the aggregatedIter(which 
may be interruptible), then creates an iterator which isn't interruptible.

The problem with this is that Spark task cannot be cancelled due to stage 
fail(without interruptThread enabled, which is disabled by default), which 
wasting executor resource.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23040) BlockStoreShuffleReader's return Iterator isn't interruptible if aggregator or ordering is specified

Reply via email to