[
https://issues.apache.org/jira/browse/FLINK-32663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749719#comment-17749719
]
Dong Lin commented on FLINK-32663:
----------------------------------
Hi [~fanrui], FLINK-28386 will let Flink runtime trigger a checkpoint
immediately after all tasks have received end-of-data, instead of after all
sources finished. I have updated its JIRA title/description to clarify this.
And yes, I believe it is reasonable to trigger a checkpoint immediately after
tall tasks have received end-of-data. This is because if we don't do this, the
streaming job will waste time waiting for the triggering of the next periodic
checkpoint triggering in order to shutdown, for no good reason.
By "it caused this bug", I guess what you mean is that the test failure is
caused by a checkpoint triggered by FLINK-28386. However, it does not
necessarily mean that FLINK-28386 is the root cause of this bug. For example,
if the test can fail due to an immediate checkpoint triggered by FLINK-28386,
there is also chance that the test can fail due to periodically triggered
checkpoint.
So we probably need to understand the root cause before determining how to fix
it.
> RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList fails on
> AZP
> ---------------------------------------------------------------------------------
>
> Key: FLINK-32663
> URL: https://issues.apache.org/jira/browse/FLINK-32663
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.18.0
> Reporter: Sergey Nuyanzin
> Assignee: Yanfei Lei
> Priority: Blocker
> Labels: test-stability
> Attachments: screenshot-1.png
>
>
> This build
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51501&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8665
> fails as
> {noformat}
> Jul 21 01:24:54 01:24:54.146 [ERROR]
> RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList Time
> elapsed: 1.485 s <<< FAILURE!
> Jul 21 01:24:54 java.lang.AssertionError: expected:<530> but was:<30>
> Jul 21 01:24:54 at org.junit.Assert.fail(Assert.java:89)
> Jul 21 01:24:54 at org.junit.Assert.failNotEquals(Assert.java:835)
> Jul 21 01:24:54 at org.junit.Assert.assertEquals(Assert.java:647)
> Jul 21 01:24:54 at org.junit.Assert.assertEquals(Assert.java:633)
> Jul 21 01:24:54 at
> org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingPartitionedOperatorState(RescalingITCase.java:621)
> Jul 21 01:24:54 at
> org.apache.flink.test.checkpointing.RescalingITCase.testSavepointRescalingInPartitionedOperatorStateList(RescalingITCase.java:508)
> Jul 21 01:24:54 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> Jul 21 01:24:54 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Jul 21 01:24:54 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:4
> ...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)