[jira] [Commented] (FLINK-32663) RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList fails on AZP

Dong Lin (Jira) Tue, 01 Aug 2023 05:53:17 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749719#comment-17749719
 ]


Dong Lin commented on FLINK-32663:
----------------------------------

Hi [~fanrui], FLINK-28386 will let Flink runtime trigger a checkpoint 
immediately after all tasks have received end-of-data, instead of after all 
sources finished. I have updated its JIRA title/description to clarify this.

And yes, I believe it is reasonable to trigger a checkpoint immediately after 
tall tasks have received end-of-data. This is because if we don't do this, the 
streaming job will waste time waiting for the triggering of the next periodic 
checkpoint triggering in order to shutdown, for no good reason.

By "it caused this bug", I guess what you mean is that the test failure is 
caused by a checkpoint triggered by FLINK-28386. However, it does not 
necessarily mean that FLINK-28386 is the root cause of this bug. For example, 
if the test can fail due to an immediate checkpoint triggered by FLINK-28386, 
there is also chance that the test can fail due to periodically triggered 
checkpoint.

So we probably need to understand the root cause before determining how to fix 
it.

 

> RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList fails on 
> AZP
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-32663
>                 URL: https://issues.apache.org/jira/browse/FLINK-32663
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.0
>            Reporter: Sergey Nuyanzin
>            Assignee: Yanfei Lei
>            Priority: Blocker
>              Labels: test-stability
>         Attachments: screenshot-1.png
>
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51501&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8665
> fails as
> {noformat}
> Jul 21 01:24:54 01:24:54.146 [ERROR] 
> RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList  Time 
> elapsed: 1.485 s  <<< FAILURE!
> Jul 21 01:24:54 java.lang.AssertionError: expected:<530> but was:<30>
> Jul 21 01:24:54       at org.junit.Assert.fail(Assert.java:89)
> Jul 21 01:24:54       at org.junit.Assert.failNotEquals(Assert.java:835)
> Jul 21 01:24:54       at org.junit.Assert.assertEquals(Assert.java:647)
> Jul 21 01:24:54       at org.junit.Assert.assertEquals(Assert.java:633)
> Jul 21 01:24:54       at 
> org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingPartitionedOperatorState(RescalingITCase.java:621)
> Jul 21 01:24:54       at 
> org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList(RescalingITCase.java:508)
> Jul 21 01:24:54       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Jul 21 01:24:54       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 21 01:24:54       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:4
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32663) RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList fails on AZP

Reply via email to