[jira] [Assigned] (BEAM-3494) Snapshot state of aggregated data of apache beam project is not maintained in flink's checkpointing
[ https://issues.apache.org/jira/browse/BEAM-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek reassigned BEAM-3494: -- Assignee: (was: Aljoscha Krettek) > Snapshot state of aggregated data of apache beam project is not maintained in > flink's checkpointing > > > Key: BEAM-3494 > URL: https://issues.apache.org/jira/browse/BEAM-3494 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: suganya >Priority: Major > > We have a beam project which consumes events from kafka,does a groupby in a > time window(5 mins),after window elapses it pushes the events to downstream > for merge.This project is deployed using flink ,we have enabled checkpointing > to recover from failed state. > (windowsize: 5mins , checkpointingInterval: 5mins,state.backend: filesystem) > Offsets from kafka get checkpointed every 5 > mins(checkpointingInterval).Before finishing the entire DAG(groupBy and > merge) , events offsets are getting checkpointed.So incase of any restart > from task-manager ,new task gets started from last successful checkpoint ,but > we could'nt able to get the aggregated snapshot data(data from groupBy task) > from the persisted checkpoint. > Able to retrieve the last successful checkpointed offset from kafka ,but > couldnt able to get last aggregated data till checkpointing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3494) Snapshot state of aggregated data of apache beam project is not maintained in flink's checkpointing
[ https://issues.apache.org/jira/browse/BEAM-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3494: - Assignee: Aljoscha Krettek (was: Kenneth Knowles) > Snapshot state of aggregated data of apache beam project is not maintained in > flink's checkpointing > > > Key: BEAM-3494 > URL: https://issues.apache.org/jira/browse/BEAM-3494 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: suganya >Assignee: Aljoscha Krettek >Priority: Major > > We have a beam project which consumes events from kafka,does a groupby in a > time window(5 mins),after window elapses it pushes the events to downstream > for merge.This project is deployed using flink ,we have enabled checkpointing > to recover from failed state. > (windowsize: 5mins , checkpointingInterval: 5mins,state.backend: filesystem) > Offsets from kafka get checkpointed every 5 > mins(checkpointingInterval).Before finishing the entire DAG(groupBy and > merge) , events offsets are getting checkpointed.So incase of any restart > from task-manager ,new task gets started from last successful checkpoint ,but > we could'nt able to get the aggregated snapshot data(data from groupBy task) > from the persisted checkpoint. > Able to retrieve the last successful checkpointed offset from kafka ,but > couldnt able to get last aggregated data till checkpointing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)