Martijn Visser created FLINK-40070:
--------------------------------------

             Summary: DynamicParameterITCase hangs when the JobManager startup 
banner rolls to a numbered log file
                 Key: FLINK-40070
                 URL: https://issues.apache.org/jira/browse/FLINK-40070
             Project: Flink
          Issue Type: Bug
          Components: Test Infrastructure, Tests
    Affects Versions: 2.4.0
            Reporter: Martijn Visser
            Assignee: Martijn Visser


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=76627&view=results
 (leg: e2e_4_ci); previously build 75992.

{{DynamicParameterITCase}} starts a JobManager via jobmanager.sh and reads the 
startup banner (program arguments, classpath) back from the distribution logs 
to assert how dynamic parameters were passed through. The distribution log4j 
configuration rolls the log file on startup ({{OnStartupTriggeringPolicy}}), so 
the banner frequently lands in a rolled file (for example 
{{flink-...-standalonesession-1-host.log.1}}), which 
{{FlinkDistribution.searchAllLogs}} deliberately skips.

Two failure modes of the same race:
- The readiness loop waits for a "Classpath:" line that never appears in the 
live .log, spinning with no upper bound until the CI watchdog kills the leg. In 
build 76627, {{testWithHostAndPort}} started and never finished.
- When rotation happens between the readiness check and the read, the test 
parses an incomplete arguments block and fails with "Missing required option: 
c".

Proposed fix: let {{FlinkDistribution.searchAllLogs}} optionally include rolled 
log files when looking for the startup banner (other callers unchanged), and 
bound the readiness wait with {{CommonTestUtils.waitUtil}} so a missing banner 
fails fast with a clear message instead of hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to