Chesnay Schepler created FLINK-22229:
----------------------------------------
Summary: Noise stacktraces if checkpoints fails because sources
are not ready
Key: FLINK-22229
URL: https://issues.apache.org/jira/browse/FLINK-22229
Project: Flink
Issue Type: Sub-task
Components: Runtime / Checkpointing, Runtime / Task
Reporter: Chesnay Schepler
Reported by Stephan Ewen:
When checkpoints are being skipped because not all sources are ready, we log a
big stack trace.
This can happen multiple times during job startup (while TMs are starting,
pulling artifacts, tasks are being scheduled), and it creates the impression
that something is wrong, when this is really a totally normal and expected
situation.
We can probably reduce this to an INFO level statement, because this isn't
something bad or unexpected, and the stack trace doesn't add any information
(there are really only RPC worker methods in the trace).
{code}
7862 [Checkpoint Timer] WARN
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Failed to
trigger checkpoint for job b7476b12c6b7da7b72fa4dfb4fc0741a.)
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint triggering
task Source: numbers -> Map -> Sink: Data stream collect sink (1/1) of job
b7476b12c6b7da7b72fa4dfb4fc0741a has not being executed at the moment. Aborting
checkpoint. Failure reason: Not all required tasks are currently running.
at
org.apache.flink.runtime.checkpoint.DefaultCheckpointPlanCalculator.checkTasksStarted(DefaultCheckpointPlanCalculator.java:152)
~[classes/:?]
at
org.apache.flink.runtime.checkpoint.DefaultCheckpointPlanCalculator.lambda$calculateCheckpointPlan$1(DefaultCheckpointPlanCalculator.java:114)
~[classes/:?]
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
~[?:?]
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
~[classes/:?]
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
~[classes/:?]
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
~[classes/:?]
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
~[classes/:?]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
~[scala-library-2.11.12.jar:?]
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
~[scala-library-2.11.12.jar:?]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
~[scala-library-2.11.12.jar:?]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
~[scala-library-2.11.12.jar:?]
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at
akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
~[akka-actor_2.11-2.5.21.jar:2.5.21]
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)