[ 
https://issues.apache.org/jira/browse/FLINK-30629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728714#comment-17728714
 ] 

Liu commented on FLINK-30629:
-----------------------------

[~Sergey Nuyanzin] Thanks. From the log, we can see the logs in time order:
 # The dispatcher shuts down for that the client's heartbeat timeout.
 # The client begins to report its heartbeat.

The reason is that the client will report its heartbeat after calling the 
method waitUntilJobInitializationFinished. In this method, we try to get the 
job's status by waiting 

exponentially and it may take a while. There are two ways to fix the test:
 # Increase the client's timeout from 500 ms to 1 second or more.
 # In the method waitUntilJobInitializationFinished, try to get the job's 
status more frequently.

What do you think? cc [~xtsong] 

> ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat is unstable
> ---------------------------------------------------------------------
>
>                 Key: FLINK-30629
>                 URL: https://issues.apache.org/jira/browse/FLINK-30629
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission
>    Affects Versions: 1.17.0, 1.18.0
>            Reporter: Xintong Song
>            Assignee: Weijie Guo
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>             Fix For: 1.17.0
>
>         Attachments: ClientHeartbeatTestLog.txt, 
> logs-cron_azure-test_cron_azure_core-1685497478.zip
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44690&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=10819
> {code:java}
> Jan 11 04:32:39 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 21.02 s <<< FAILURE! - in 
> org.apache.flink.client.ClientHeartbeatTest
> Jan 11 04:32:39 [ERROR] 
> org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat
>   Time elapsed: 9.157 s  <<< ERROR!
> Jan 11 04:32:39 java.lang.IllegalStateException: MiniCluster is not yet 
> running or has already been shut down.
> Jan 11 04:32:39       at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
> Jan 11 04:32:39       at 
> org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:1044)
> Jan 11 04:32:39       at 
> org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:917)
> Jan 11 04:32:39       at 
> org.apache.flink.runtime.minicluster.MiniCluster.getJobStatus(MiniCluster.java:841)
> Jan 11 04:32:39       at 
> org.apache.flink.runtime.minicluster.MiniClusterJobClient.getJobStatus(MiniClusterJobClient.java:91)
> Jan 11 04:32:39       at 
> org.apache.flink.client.ClientHeartbeatTest.testJobRunningIfClientReportHeartbeat(ClientHeartbeatTest.java:79)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to