[ 
https://issues.apache.org/jira/browse/FLINK-24163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410504#comment-17410504
 ] 

Yun Gao commented on FLINK-24163:
---------------------------------

This one should be called by the test configuration. There are two problems:
 # The checkpoint timeout 5s is still not large enough, it might be caused when 
the unaligned checkpoint is enabled, the checkpoint performance id bounded by 
disk IO.
 # We also met with 
{code}
java.util.concurrent.ExecutionException: java.io.IOException: Size of the state 
is larger than the maximum permitted memory-backed state. Size=8641168, 
maxSize=5242880. Consider using a different checkpoint storage, like the 
FileSystemCheckpointStorage.
    at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
~[?:1.8.0_292]
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
~[?:1.8.0_292]
    at 
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:66)
 ~[flink-streaming-java_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:177)
 ~[flink-streaming-java_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124)
 [flink-streaming-java_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_292]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_292]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Caused by: java.io.IOException: Size of the state is larger than the maximum 
permitted memory-backed state. Size=8641168, maxSize=5242880. Consider using a 
different checkpoint storage, like the FileSystemCheckpointStorage.
    at 
org.apache.flink.runtime.state.memory.MemCheckpointStreamFactory.checkSize(MemCheckpointStreamFactory.java:63)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.state.memory.MemCheckpointStreamFactory$MemoryCheckpointOutputStream.closeAndGetBytes(MemCheckpointStreamFactory.java:140)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.state.memory.MemCheckpointStreamFactory$MemoryCheckpointOutputStream.closeAndGetHandle(MemCheckpointStreamFactory.java:120)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.finishWriteAndResult(ChannelStateCheckpointWriter.java:218)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.doComplete(ChannelStateCheckpointWriter.java:240)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.lambda$complete$5(ChannelStateCheckpointWriter.java:202)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.runWithChecks(ChannelStateCheckpointWriter.java:296)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.complete(ChannelStateCheckpointWriter.java:200)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.completeInput(ChannelStateCheckpointWriter.java:187)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.CheckpointInProgressRequest.execute(ChannelStateWriteRequest.java:212)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.dispatchInternal(ChannelStateWriteRequestDispatcherImpl.java:85)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.dispatch(ChannelStateWriteRequestDispatcherImpl.java:62)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.loop(ChannelStateWriteRequestExecutorImpl.java:96)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    at 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.run(ChannelStateWriteRequestExecutorImpl.java:75)
 ~[flink-runtime-1.14-SNAPSHOT.jar:1.14-SNAPSHOT]
    ... 1 more
{code}
This might be caused by we have channel state with unaligned checkpoints.

I'll open a PR to increase the checkpoint timeout and change to another 
statebackend type. 
    

> PartiallyFinishedSourcesITCase fails due to timeout
> ---------------------------------------------------
>
>                 Key: FLINK-24163
>                 URL: https://issues.apache.org/jira/browse/FLINK-24163
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.14.0, 1.15.0
>            Reporter: Xintong Song
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.14.0, 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23529&view=logs&j=4d4a0d10-fca2-5507-8eed-c07f0bdf4887&t=7b25afdf-cc6c-566f-5459-359dc2585798&l=10996
> {code}
> Sep 04 04:35:28 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, 
> Time elapsed: 155.236 s <<< FAILURE! - in 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase
> Sep 04 04:35:28 [ERROR] test[complex graph ALL_SUBTASKS, failover: false]  
> Time elapsed: 65.999 s  <<< ERROR!
> Sep 04 04:35:28 java.util.concurrent.TimeoutException: Condition was not met 
> in given timeout.
> Sep 04 04:35:28       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:164)
> Sep 04 04:35:28       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:142)
> Sep 04 04:35:28       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:134)
> Sep 04 04:35:28       at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForSubtasksToFinish(CommonTestUtils.java:297)
> Sep 04 04:35:28       at 
> org.apache.flink.runtime.operators.lifecycle.TestJobExecutor.waitForSubtasksToFinish(TestJobExecutor.java:219)
> Sep 04 04:35:28       at 
> org.apache.flink.runtime.operators.lifecycle.PartiallyFinishedSourcesITCase.test(PartiallyFinishedSourcesITCase.java:82)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to