[ https://issues.apache.org/jira/browse/APEXMALHAR-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546572#comment-15546572 ]
Munagala V. Ramanath commented on APEXMALHAR-2254: -------------------------------------------------- Here is a list of other JIRAs related to this operator: APEXMALHAR-2250 AbstractFileInputOperator.DirectoryScanner does not handle directories correctly. APEXMALHAR-2270 AbstractFileInputOperator: During replay, inputStream should skip tuples APEXMALHAR-2269 AbstractFileInputOperator: During replay, IO errors not handled APEXMALHAR-2263 Offsets in AbstractFileInputOperator should be long rather than int APEXMALHAR-2021 Add property to AbstractFileInputOperator to trim processedFiles and ignoredFiles APEXMALHAR-2268 AbstractFileInputOperator: During replay, readEntity may be called without calling openFile. APEXMALHAR-2274 AbstractFileInputOperator gets killed when there are a large number of files. > File input operator is not idempotent with closing files on replay > ------------------------------------------------------------------ > > Key: APEXMALHAR-2254 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2254 > Project: Apache Apex Malhar > Issue Type: Bug > Reporter: Pramod Immaneni > Assignee: Pramod Immaneni > > With the file input operator, on a replay in a failure scenario, the same > data is output as before the failure, for every window that is being replayed > after checkpoint. To do this the operator keeps track of the files and > offsets for every window and replays the data based on that. > However, if it so happens that before the failure the processing of a file > was finished and it was closed exactly before the end window and the next > file was opened and processed in a new window, in the replay the closing of > the first file does not happen in earlier window but happens in the latter > window. This can cause problems if an operator depends on the closing file > also to happen in an idempotent manner. > Improve the operator to save the closing and opening of files in the > idempotent state as well so that it can also happen in an idempotent manner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)