[ 
https://issues.apache.org/jira/browse/FLINK-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398055#comment-16398055
 ] 

vinoyang commented on FLINK-8915:
---------------------------------

Hi [~gjy] , I just analyzed this issue. The real source is 
CheckpointStatsHistory's 
{code:java}
/** Array of checkpointsArray. Writes go against this array. */
private transient AbstractCheckpointStats[] checkpointsArray;
{code}
I found the pending checkpoint could also be added to this array through method 
:
{code:java}
void addInProgressCheckpoint(PendingCheckpointStats pending) {
{code}
then when the checkpoint transfered to the terminal state (completed or 
failed), it will trigger this callback method :
{code:java}
boolean replacePendingCheckpointById(AbstractCheckpointStats completedOrFailed) 
{
{code}
and others access the history info through the *createSnapshot* method which 
return the instance of *CheckpointStatsHistory*. However, this method create 
the instance with the checkpointArray. So it may constain the 
pendingcheckpoint. 

In my opinion, we should think the sematics of the *CheckpointStatsHistory* , 
when we call the *createSnapshot* method, whether it can contains pending 
checkpoint or not. If not ,we can filter pending checkpoint by status. If yes, 
we can also judge the status and give a error log or something handle way.

What's your opinion?

 

 

 

 

 

> CheckpointingStatisticsHandler fails to return PendingCheckpointStats 
> ----------------------------------------------------------------------
>
>                 Key: FLINK-8915
>                 URL: https://issues.apache.org/jira/browse/FLINK-8915
>             Project: Flink
>          Issue Type: Bug
>          Components: REST
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Gary Yao
>            Assignee: vinoyang
>            Priority: Blocker
>              Labels: flip6
>             Fix For: 1.5.0
>
>
> {noformat}
> 2018-03-10 21:47:52,487 ERROR 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler
>   - Implementation error: Unhandled exception.
> java.lang.IllegalArgumentException: Given checkpoint stats object of type 
> org.apache.flink.runtime.checkpoint.PendingCheckpointStats cannot be 
> converted.
>       at 
> org.apache.flink.runtime.rest.messages.checkpoints.CheckpointStatistics.generateCheckpointStatistics(CheckpointStatistics.java:276)
>       at 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:146)
>       at 
> org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleRequest(CheckpointingStatisticsHandler.java:54)
>       at 
> org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$0(AbstractExecutionGraphHandler.java:81)
>       at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>       at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>       at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>       at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to