[
https://issues.apache.org/jira/browse/TEZ-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010666#comment-18010666
]
László Bodor commented on TEZ-4565:
-----------------------------------
tempted to merge this PR, linked TEZ-4553, which caused this behavior change,
but not a bug actually
with yarn task scheudler service: [^syslog_dag_with_old_scheduler.log]
there is only a single preemption for the expected v3 vertex
{code}
2025-07-29 14:51:43,315 [INFO] [AMRM Callback Handler Thread]
|rm.YarnTaskSchedulerService|: Preempting container:
container_1753793491126_0001_01_000003 currently allocated to a task.
2025-07-29 14:51:43,318 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1753793491126_0001_1][Event:TASK_ATTEMPT_FINISHED]:
vertexName=v3, taskAttemptId=attempt_1753793491126_0001_1_02_000000_0,
creationTime=1753793502634, allocationTime=1753793502634,
startTime=1753793502644, finishTime=1753793503318, timeTaken=674,
status=KILLED, errorEnum=INTERNAL_PREEMPTION, diagnostics=Container
container_1753793491126_0001_01_000003 finished with diagnostics set to
[Container preempted internally], nodeHttpAddress=localhost:61240,
counters=Counters: 0
{code}
with the dag aware one, no matter if the test fails:
[^syslog_dag_1753791202801_0001_1.failed.txt]
or passes: [^syslog_dag_1753792102822_0001_1.passed.txt]
mind "DagAwareYarnTaskScheduler" as the scheduler due to TEZ-4553
there are 2 preemptions:
{code}
2025-07-29 14:13:34,010 [INFO] [AMRM Callback Handler Thread]
|rm.DagAwareYarnTaskScheduler|: Preempting container
container_1753791202801_0001_01_000005 currently allocated to task
attempt_1753791202801_0001_1_01_000001_1
2025-07-29 14:13:34,010 [INFO] [AMRM Callback Handler Thread]
|rm.DagAwareYarnTaskScheduler|: Preempting container
container_1753791202801_0001_01_000005 currently allocated to a task
2025-07-29 14:13:34,335 [INFO] [AMRM Callback Handler Thread]
|rm.DagAwareYarnTaskScheduler|: Preempting container
container_1753791202801_0001_01_000002 currently allocated to task
attempt_1753791202801_0001_1_02_000000_0
2025-07-29 14:13:34,335 [INFO] [AMRM Callback Handler Thread]
|rm.DagAwareYarnTaskScheduler|: Preempting container
container_1753791202801_0001_01_000002 currently allocated to a task
{code}
assuming that the test detects the preemption of task of v3 vertex all the
time, we need to make it more robust and let it be resilient to whether a task
of v2 vertex has also been preempted or not
> TestAnalyzer subtest testInternalPreemption is flaky
> ----------------------------------------------------
>
> Key: TEZ-4565
> URL: https://issues.apache.org/jira/browse/TEZ-4565
> Project: Apache Tez
> Issue Type: Test
> Reporter: Jonathan Turner Eagles
> Assignee: Jonathan Turner Eagles
> Priority: Major
> Fix For: 0.10.4
>
> Attachments: syslog_dag_1753791202801_0001_1.failed.txt,
> syslog_dag_1753792102822_0001_1.passed.txt, syslog_dag_with_old_scheduler.log
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)