[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Immaneni updated APEXMALHAR-2254:
----------------------------------------
    Description: 
With the file input operator, on a replay in a failure scenario, the same data 
is output as before the failure, for every window that is being replayed after 
checkpoint. To do this the operator keeps track of the files and offsets for 
every window and replays the data based on that. 

However, if it so happens that before the failure the processing of a file was 
finished and it was closed exactly before the end window and the next file was 
opened and processed in a new window, in the replay the closing of the first 
file does not happen in earlier window but happens in the latter window. This 
can cause problems if an operator depends on the closing file also to happen in 
an idempotent manner.

Improve the operator to save the closing and opening of files in the idempotent 
state as well so that it can also happen in an idempotent manner.

  was:
With the file input operator, on a replay, the same data is replayed for the 
windows that are being replayed after checkpoint. To do this the operator keeps 
track of the files and offsets for every window and replays the data based on 
that. 

However, if it so happens that before the failure the processing of a file was 
finished and it was closed exactly before the end window and the next file was 
opened and processed in a new window, in the replay the closing of the first 
file does not happen in earlier window but happens in the latter window. This 
can cause problems if an operator depends on the closing file also to happen in 
an idempotent manner.

Improve the operator to save the closing and opening of files in the idempotent 
state as well so that it can also happen in an idempotent manner.


> File input operator is not idempotent with closing files on replay
> ------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2254
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2254
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Pramod Immaneni
>            Assignee: Pramod Immaneni
>
> With the file input operator, on a replay in a failure scenario, the same 
> data is output as before the failure, for every window that is being replayed 
> after checkpoint. To do this the operator keeps track of the files and 
> offsets for every window and replays the data based on that. 
> However, if it so happens that before the failure the processing of a file 
> was finished and it was closed exactly before the end window and the next 
> file was opened and processed in a new window, in the replay the closing of 
> the first file does not happen in earlier window but happens in the latter 
> window. This can cause problems if an operator depends on the closing file 
> also to happen in an idempotent manner.
> Improve the operator to save the closing and opening of files in the 
> idempotent state as well so that it can also happen in an idempotent manner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to