[ 
https://issues.apache.org/jira/browse/SPARK-34154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272867#comment-17272867
 ] 

Attila Zsolt Piros commented on SPARK-34154:
--------------------------------------------

I tried to reproduce this issue by running it locally 100 times but all of them 
was successful. 
So my next idea is to reproduce it by the PR builder (and getting the stack 
trace of the thread which got stuck):
https://github.com/apache/spark/pull/31363



> Flaky Test: LocalityPlacementStrategySuite.handle large number of containers 
> and tasks (SPARK-18750)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34154
>                 URL: https://issues.apache.org/jira/browse/SPARK-34154
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 3.0.2, 3.2.0, 3.1.1
>            Reporter: Dongjoon Hyun
>            Priority: Major
>
> `LocalityPlacementStrategySuite` hangs sometimes like the following. We can 
> retriever, but it takes our resource significantly because it hangs until the 
> timeout (6 hours) occurs.
> [https://github.com/apache/spark/runs/1719480243]
> [https://github.com/apache/spark/runs/1724459002]
> [https://github.com/apache/spark/runs/1717958874]
> [https://github.com/apache/spark/runs/1731673955] (branch-3.0)
> {code:java}
> [info] LocalityPlacementStrategySuite:
> 17299[info] *** Test still running after 3 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17300[info] *** Test still running after 8 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17301[info] *** Test still running after 13 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17302[info] *** Test still running after 18 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17303[info] *** Test still running after 23 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17304[info] *** Test still running after 28 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17305[info] *** Test still running after 33 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17306[info] *** Test still running after 38 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17307[info] *** Test still running after 43 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17308[info] *** Test still running after 48 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17309[info] *** Test still running after 53 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17310[info] *** Test still running after 58 minutes, 6 seconds: suite name: 
> LocalityPlacementStrategySuite, test name: handle large number of containers 
> and tasks (SPARK-18750). 
> 17311[info] *** Test still running after 1 hour, 3 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17312[info] *** Test still running after 1 hour, 8 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17313[info] *** Test still running after 1 hour, 13 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17314[info] *** Test still running after 1 hour, 18 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17315[info] *** Test still running after 1 hour, 23 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17316[info] *** Test still running after 1 hour, 28 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17317[info] *** Test still running after 1 hour, 33 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17318[info] *** Test still running after 1 hour, 38 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17319[info] *** Test still running after 1 hour, 43 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17320[info] *** Test still running after 1 hour, 48 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17321[info] *** Test still running after 1 hour, 53 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17322[info] *** Test still running after 1 hour, 58 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17323[info] *** Test still running after 2 hours, 3 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17324[info] *** Test still running after 2 hours, 8 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17325[info] *** Test still running after 2 hours, 13 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17326[info] *** Test still running after 2 hours, 18 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17327[info] *** Test still running after 2 hours, 23 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17328[info] *** Test still running after 2 hours, 28 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17329[info] *** Test still running after 2 hours, 33 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17330[info] *** Test still running after 2 hours, 38 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17331[info] *** Test still running after 2 hours, 43 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17332[info] *** Test still running after 2 hours, 48 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17333[info] *** Test still running after 2 hours, 53 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17334[info] *** Test still running after 2 hours, 58 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17335[info] *** Test still running after 3 hours, 3 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17336[info] *** Test still running after 3 hours, 8 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17337[info] *** Test still running after 3 hours, 13 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17338[info] *** Test still running after 3 hours, 18 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17339[info] *** Test still running after 3 hours, 23 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17340[info] *** Test still running after 3 hours, 28 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17341[info] *** Test still running after 3 hours, 33 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17342[info] *** Test still running after 3 hours, 38 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17343[info] *** Test still running after 3 hours, 43 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17344[info] *** Test still running after 3 hours, 48 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17345[info] *** Test still running after 3 hours, 53 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17346[info] *** Test still running after 3 hours, 58 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17347[info] *** Test still running after 4 hours, 3 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17348[info] *** Test still running after 4 hours, 8 minutes, 6 seconds: suite 
> name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17349[info] *** Test still running after 4 hours, 13 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750). 
> 17350[info] *** Test still running after 4 hours, 18 minutes, 6 seconds: 
> suite name: LocalityPlacementStrategySuite, test name: handle large number of 
> containers and tasks (SPARK-18750).  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to