[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Qiang Yang updated SPARK-43221: ------------------------------- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information line:1092 !image-2023-04-21-00-24-22-059.png! step 2: On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk line: 852, 856 !image-2023-04-21-00-30-41-851.png! step 3: was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information code: !image-2023-04-21-00-19-58-021.png! step 2: > Executor obtained error information > ------------------------------------ > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager > Affects Versions: 3.1.1, 3.2.0, 3.3.0 > Reporter: Qiang Yang > Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org