[ 
https://issues.apache.org/jira/browse/FLINK-22259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321282#comment-17321282
 ] 

Arvid Heise edited comment on FLINK-22259 at 4/15/21, 7:25 AM:
---------------------------------------------------------------

-I guess the test assumes that the enumerator never fails (it has transient 
state). The test should also persist the transient state.-

The enumerator state is correct
{noformat}
snapshotState EnumeratorState{unassignedSplits=[], numRestarts=5, 
numCompletedCheckpoints=11}
{noformat}

It's just that the sync event is never reaching the reader caused by 
FLINK-18071 (or FLINK-21996).


was (Author: aheise):
-I guess the test assumes that the enumerator never fails (it has transient 
state). The test should also persist the transient state.-

The enumerator state is correct
{noformat}
snapshotState EnumeratorState{unassignedSplits=[], numRestarts=5, 
numCompletedCheckpoints=11}
{noformat}

It's just that the sync event is never reaching the reader caused by 
FLINK-21996 .

> UnalignedCheckpointITCase fails with "Value too large for header, this 
> indicates that the test is running too long"
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-22259
>                 URL: https://issues.apache.org/jira/browse/FLINK-22259
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Arvid Heise
>            Priority: Major
>              Labels: test-stability
>             Fix For: 1.13.0
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16419&view=logs&j=34f41360-6c0d-54d3-11a1-0292a2def1d9&t=2d56e022-1ace-542f-bf1a-b37dd63243f2&l=9672]
>  
> {code:java}
> 2021-04-13T07:37:31.9388503Z [ERROR] execute[pipeline with remote channels, p 
> = 1, timeout = 
> 0](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 1,420.285 s  <<< ERROR!
> 2021-04-13T07:37:31.9395135Z 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2021-04-13T07:37:31.9395717Z  at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
> 2021-04-13T07:37:31.9396274Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:168)
> 2021-04-13T07:37:31.9396866Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:274)
> 2021-04-13T07:37:31.9397318Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2021-04-13T07:37:31.9397723Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2021-04-13T07:37:31.9398312Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2021-04-13T07:37:31.9398724Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2021-04-13T07:37:31.9401916Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2021-04-13T07:37:31.9402764Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2021-04-13T07:37:31.9403756Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2021-04-13T07:37:31.9404222Z  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2021-04-13T07:37:31.9404624Z  at 
> org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
> 2021-04-13T07:37:31.9405008Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2021-04-13T07:37:31.9405449Z  at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> 2021-04-13T07:37:31.9405855Z  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 2021-04-13T07:37:31.9406362Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2021-04-13T07:37:31.9406774Z  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 2021-04-13T07:37:31.9407512Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 2021-04-13T07:37:31.9408202Z  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 2021-04-13T07:37:31.9408655Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2021-04-13T07:37:31.9409083Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2021-04-13T07:37:31.9409521Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2021-04-13T07:37:31.9410114Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2021-04-13T07:37:31.9410775Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2021-04-13T07:37:31.9411398Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2021-04-13T07:37:31.9411914Z  at 
> org.junit.runners.Suite.runChild(Suite.java:128)
> 2021-04-13T07:37:31.9412292Z  at 
> org.junit.runners.Suite.runChild(Suite.java:27)
> 2021-04-13T07:37:31.9412670Z  at 
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 2021-04-13T07:37:31.9413097Z  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 2021-04-13T07:37:31.9413538Z  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 2021-04-13T07:37:31.9413964Z  at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 2021-04-13T07:37:31.9414405Z  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 2021-04-13T07:37:31.9414834Z  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> 2021-04-13T07:37:31.9415263Z  at 
> org.junit.rules.RunRules.evaluate(RunRules.java:20)
> 2021-04-13T07:37:31.9415661Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2021-04-13T07:37:31.9416099Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2021-04-13T07:37:31.9416773Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2021-04-13T07:37:31.9417404Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2021-04-13T07:37:31.9417931Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2021-04-13T07:37:31.9418480Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2021-04-13T07:37:31.9419026Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2021-04-13T07:37:31.9419575Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2021-04-13T07:37:31.9420060Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2021-04-13T07:37:31.9420648Z Caused by: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=5, 
> backoffTimeMS=100)
> 2021-04-13T07:37:31.9421551Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
> 2021-04-13T07:37:31.9422344Z  at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
> 2021-04-13T07:37:31.9422995Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:206)
> 2021-04-13T07:37:31.9423652Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:196)
> 2021-04-13T07:37:31.9424268Z  at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:187)
> 2021-04-13T07:37:31.9424854Z  at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:680)
> 2021-04-13T07:37:31.9425411Z  at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
> 2021-04-13T07:37:31.9425963Z  at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:435)
> 2021-04-13T07:37:31.9426401Z  at 
> sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> 2021-04-13T07:37:31.9426936Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2021-04-13T07:37:31.9427389Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2021-04-13T07:37:31.9427840Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305)
> 2021-04-13T07:37:31.9428372Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212)
> 2021-04-13T07:37:31.9428911Z  at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> 2021-04-13T07:37:31.9429459Z  at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> 2021-04-13T07:37:31.9429932Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2021-04-13T07:37:31.9430341Z  at 
> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2021-04-13T07:37:31.9430775Z  at 
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2021-04-13T07:37:31.9431346Z  at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2021-04-13T07:37:31.9431879Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2021-04-13T07:37:31.9432322Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2021-04-13T07:37:31.9432754Z  at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2021-04-13T07:37:31.9433176Z  at 
> akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2021-04-13T07:37:31.9433579Z  at 
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2021-04-13T07:37:31.9434003Z  at 
> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2021-04-13T07:37:31.9434400Z  at 
> akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2021-04-13T07:37:31.9434777Z  at 
> akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2021-04-13T07:37:31.9435242Z  at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2021-04-13T07:37:31.9435587Z  at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2021-04-13T07:37:31.9435988Z  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2021-04-13T07:37:31.9436453Z  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2021-04-13T07:37:31.9436987Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2021-04-13T07:37:31.9437463Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2021-04-13T07:37:31.9438005Z Caused by: java.lang.IllegalStateException: 
> Value too large for header, this indicates that the test is running too long.
> 2021-04-13T07:37:31.9438520Z  at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
> 2021-04-13T07:37:31.9439065Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.withHeader(UnalignedCheckpointTestBase.java:1111)
> 2021-04-13T07:37:31.9439735Z  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase$LongSource$LongSourceReader.pollNext(UnalignedCheckpointTestBase.java:342)
> 2021-04-13T07:37:31.9440378Z  at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:294)
> 2021-04-13T07:37:31.9440941Z  at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:69)
> 2021-04-13T07:37:31.9441727Z  at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> 2021-04-13T07:37:31.9442375Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:419)
> 2021-04-13T07:37:31.9442943Z  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> 2021-04-13T07:37:31.9443491Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:657)
> 2021-04-13T07:37:31.9443999Z  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
> 2021-04-13T07:37:31.9444448Z  at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:776)
> 2021-04-13T07:37:31.9444874Z  at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
> 2021-04-13T07:37:31.9445249Z  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to