[ 
https://issues.apache.org/jira/browse/FLINK-40070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-40070:
-----------------------------------
    Labels: pull-request-available test-stability  (was: test-stability)

> DynamicParameterITCase hangs when the JobManager startup banner rolls to a 
> numbered log file
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-40070
>                 URL: https://issues.apache.org/jira/browse/FLINK-40070
>             Project: Flink
>          Issue Type: Bug
>          Components: Test Infrastructure, Tests
>    Affects Versions: 2.4.0
>            Reporter: Martijn Visser
>            Assignee: Martijn Visser
>            Priority: Major
>              Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76627&view=results
>  (leg: e2e_4_ci); previously build 75992.
> {{DynamicParameterITCase}} starts a JobManager via jobmanager.sh and reads 
> the startup banner (program arguments, classpath) back from the distribution 
> logs to assert how dynamic parameters were passed through. The distribution 
> log4j configuration rolls the log file on startup 
> ({{OnStartupTriggeringPolicy}}), so the banner frequently lands in a rolled 
> file (for example {{flink-...-standalonesession-1-host.log.1}}), which 
> {{FlinkDistribution.searchAllLogs}} deliberately skips.
> Two failure modes of the same race:
> - The readiness loop waits for a "Classpath:" line that never appears in the 
> live .log, spinning with no upper bound until the CI watchdog kills the leg. 
> In build 76627, {{testWithHostAndPort}} started and never finished.
> - When rotation happens between the readiness check and the read, the test 
> parses an incomplete arguments block and fails with "Missing required option: 
> c".
> Proposed fix: let {{FlinkDistribution.searchAllLogs}} optionally include 
> rolled log files when looking for the startup banner (other callers 
> unchanged), and bound the readiness wait with {{CommonTestUtils.waitUtil}} so 
> a missing banner fails fast with a clear message instead of hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to