[ https://issues.apache.org/jira/browse/NIFI-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877358#comment-15877358 ]
Koji Kawamura commented on NIFI-3332: ------------------------------------- [~jskora] I derived Table 1 from the test and log files that you attached to this JIRA. The Managed State column in Table 1 and 2 shows the state after a processor runs. Excuse me if I misunderstood, but you expected 'batch2-age3.txt' to be listed, didn't you? If not, please let me know what is the 'missing one' in the test: {code} after run 4 -------------------------------------------------------------------- timestamp date from timestamp t0 delta ------------------- ------------- ----------------------- -------- current time = 1484165117591 2017-01-11T20:05:17.591 1096 ---- processor state ----------------------------------------------- processed.timestamp = 1484165111000 2017-01-11T20:05:11.000 -5495 listing.timestamp = 1484165111000 2017-01-11T20:05:11.000 -5495 ---- input folder contents ----------------------------------------- batch1-age3.txt = 1484165109000 2017-01-11T20:05:09.000 -7495 batch1-age4.txt = 1484165106000 2017-01-11T20:05:06.000 -10495 batch1-age5.txt = 1484165016000 2017-01-11T20:03:36.000 -100495 batch2-age2.txt = 1484165111000 2017-01-11T20:05:11.000 -5495 batch2-age3.txt = 1484165109000 2017-01-11T20:05:09.000 -7495 batch2-age4.txt = 1484165106000 2017-01-11T20:05:06.000 -10495 ---- output flowfiles ---------------------------------------------- batch1-age5.txt = 1484165016000 2017-01-11T20:03:36.000 -100495 batch1-age4.txt = 1484165106000 2017-01-11T20:05:06.000 -10495 [pool-4-thread-1] INFO org.apache.nifi.processors.standard.ListFile - ListFile[id=ea794881-8ef4-4c79-b503-f10338978de0] Successfully created listing with 1 new objects batch1-age3.txt = 1484165109000 2017-01-11T20:05:09.000 -7495 batch2-age2.txt = 1484165111000 2017-01-11T20:05:11.000 -5495 REL_SUCCESS count = 4 -------------------------------------------------------------------- java.lang.AssertionError: Expected :5 Actual :4 <Click to see difference> {code} > Bug in ListXXX causes matching timestamps to be ignored on later runs > --------------------------------------------------------------------- > > Key: NIFI-3332 > URL: https://issues.apache.org/jira/browse/NIFI-3332 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 0.7.1, 1.1.1 > Reporter: Joe Skora > Assignee: Koji Kawamura > Priority: Critical > Attachments: Test-showing-ListFile-timestamp-bug.log, > Test-showing-ListFile-timestamp-bug.patch > > > The new state implementation for the ListXXX processors based on > AbstractListProcessor creates a race conditions when processor runs occur > while a batch of files is being written with the same timestamp. > The changes to state management dropped tracking of the files processed for a > given timestamp. Without the record of files processed, the remainder of the > batch is ignored on the next processor run since their timestamp is not > greater than the one timestamp stored in processor state. With the file > tracking it was possible to process files that matched the timestamp exactly > and exclude the previously processed files. > A basic time goes as follows. > T0 - system creates or receives batch of files with Tx timestamp where Tx > is more than the current timestamp in processor state. > T1 - system writes 1st half of Tx batch to the ListFile source directory. > T2 - ListFile runs picking up 1st half of Tx batch and stores Tx timestamp > in processor state. > T3 - system writes 2nd half of Tx batch to ListFile source directory. > T4 - ListFile runs ignoring any files with T <= Tx, eliminating 2nd half Tx > timestamp batch. > I've attached a patch[1] for TestListFile.java that adds an instrumented unit > test demonstrates the problem and a log[2] of the output from one such run. > The test writes 3 files each in two batches with processor runs after each > batch. Batch 2 writes files with timestamps older than, equal to, and newer > than the timestamp stored when batch 1 was processed, but only the newer file > is picked up. The older file is correctly ignored but file with the matchin > timestamp file should have been processed. > [1] Test-showing-ListFile-timestamp-bug.patch > [2] Test-showing-ListFile-timestamp-bug.log -- This message was sent by Atlassian JIRA (v6.3.15#6346)