[ https://issues.apache.org/jira/browse/BEAM-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341057#comment-16341057 ]
Aljoscha Krettek edited comment on BEAM-3494 at 1/26/18 1:32 PM: ----------------------------------------------------------------- How are you enabling checkpointing? Also, could you please reformat your posting to put the code inside code tags to make it more readable? was (Author: aljoscha): How are you enabling checkpointing? Also, could you please reformat your posting to put the code between code tags, like this {code:java} {code} your code ... {code}{code} to make it more readable? > Snapshot state of aggregated data of apache beam project is not maintained in > flink's checkpointing > ---------------------------------------------------------------------------------------------------- > > Key: BEAM-3494 > URL: https://issues.apache.org/jira/browse/BEAM-3494 > Project: Beam > Issue Type: Bug > Components: runner-flink > Reporter: suganya > Priority: Major > > We have a beam project which consumes events from kafka,does a groupby in a > time window(5 mins),after window elapses it pushes the events to downstream > for merge.This project is deployed using flink ,we have enabled checkpointing > to recover from failed state. > (windowsize: 5mins , checkpointingInterval: 5mins,state.backend: filesystem) > Offsets from kafka get checkpointed every 5 > mins(checkpointingInterval).Before finishing the entire DAG(groupBy and > merge) , events offsets are getting checkpointed.So incase of any restart > from task-manager ,new task gets started from last successful checkpoint ,but > we could'nt able to get the aggregated snapshot data(data from groupBy task) > from the persisted checkpoint. > Able to retrieve the last successful checkpointed offset from kafka ,but > couldnt able to get last aggregated data till checkpointing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)