[
https://issues.apache.org/jira/browse/FLINK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653560#comment-17653560
]
Gabor Somogyi commented on FLINK-24169:
---------------------------------------
YARN tests are behaving differently on local vs CI. The log file creation and
search is both implemented w/ relative directories which makes all those tests
flaky. When log files are generated in a different directory which not found by
the tests then it times out. I think it would be good to use full path instead
of relative to make the tests stable.
> Flaky local YARN tests relying on log files
> -------------------------------------------
>
> Key: FLINK-24169
> URL: https://issues.apache.org/jira/browse/FLINK-24169
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Reporter: Matthias Pohl
> Assignee: Zsombor Chikán
> Priority: Major
> Labels: pull-request-available, stale-assigned, test-stability
> Fix For: 1.17.0
>
>
> While working on [PR #16989|https://github.com/apache/flink/pull/16989] for
> FLINK-23611, we experienced some flakiness when running
> {{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally.
> [~dmvk] discovered a bug in log4j (see
> [LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug
> affects the test because they check the log files for specific log messages.
> The log messages ends up in the wrong log file if the rolling update
> mechanism is trigger. This does not seem to be an issue on AzureCI due to the
> slower hardware used for the worker machines.
> A solution to overcome this issue would be to add a custom log4j
> configuration that disables the {{appender.main.policies.startup.type =
> OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j
> configuration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)