Manu Zhang created SPARK-43510: ---------------------------------- Summary: Spark application hangs when YarnAllocator processing completed containers before updating internal state Key: SPARK-43510 URL: https://issues.apache.org/jira/browse/SPARK-43510 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 3.4.0 Reporter: Manu Zhang
I see application hangs when containers are preempted immediately after allocation as follows. {code:java} 23/05/14 09:11:33 INFO YarnAllocator: Launching container container_e3812_1684033797982_57865_01_000382 on host hdc42-mcc10-01-0910-4207-015-tess0028.stratus.rno.ebay.com for executor with ID 277 for ResourceProfile Id 0 23/05/14 09:11:33 WARN YarnAllocator: Cannot find executorId for container: container_e3812_1684033797982_57865_01_000382 23/05/14 09:11:33 INFO YarnAllocator: Completed container container_e3812_1684033797982_57865_01_000382 (state: COMPLETE, exit status: -102) 23/05/14 09:11:33 INFO YarnAllocator: Container container_e3812_1684033797982_57865_01_000382 was preempted. {code} Note the warning log where YarnAllocator cannot find executorId for the container when processing completed containers. The only plausible cause is YarnAllocator processing completed container before updating internal state and adding the executorId. The latter happens in a separate thread after executor launch. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org