[ 
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Yang updated SPARK-43221:
-------------------------------
    Description: 
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information
{code:java}
//  private[spark] def getRemoteBlock[T](
      blockId: BlockId,
      bufferTransformer: ManagedBuffer => T): Option[T] = {
    logDebug(s"Getting remote block $blockId")
    require(blockId != null, "BlockId is null")    // Because all the remote 
blocks are registered in driver, it is not necessary to ask
    // all the storage endpoints to get block status.
    val locationsAndStatusOption = master.getLocationsAndStatus(blockId, 
blockManagerId.host) {code}
step 2:

 

  was:
Spark on Yarn Cluster

When multiple executors exist on a node, and the same block exists on both 
executors, with some in memory and some on disk.

Probabilistically, the executor failed to obtain the block,throw Exception:

java.lang.ArrayIndexOutofBoundsException: 0

    at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)

 

Next, I will replay the process of the problem occurring: 

step 1:

The executor requests the driver to obtain block 
information(locationsAndStatusOption). The input parameters are BlockId and the 
host of its own node. Please note that it does not carry port information

 

step 2:

 


> Executor obtained error information 
> ------------------------------------
>
>                 Key: SPARK-43221
>                 URL: https://issues.apache.org/jira/browse/SPARK-43221
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 3.1.1, 3.2.0, 3.3.0
>            Reporter: Qiang Yang
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both 
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
>     at 
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>  
> Next, I will replay the process of the problem occurring: 
> step 1:
> The executor requests the driver to obtain block 
> information(locationsAndStatusOption). The input parameters are BlockId and 
> the host of its own node. Please note that it does not carry port information
> {code:java}
> //  private[spark] def getRemoteBlock[T](
>       blockId: BlockId,
>       bufferTransformer: ManagedBuffer => T): Option[T] = {
>     logDebug(s"Getting remote block $blockId")
>     require(blockId != null, "BlockId is null")    // Because all the remote 
> blocks are registered in driver, it is not necessary to ask
>     // all the storage endpoints to get block status.
>     val locationsAndStatusOption = master.getLocationsAndStatus(blockId, 
> blockManagerId.host) {code}
> step 2:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to