[ 
https://issues.apache.org/jira/browse/NIFI-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143719#comment-16143719
 ] 

ASF GitHub Bot commented on NIFI-3332:
--------------------------------------

Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1975
  
    @bbende Thanks for reviewing this.
    
    This PR is now rebased with the latest master. The last commit includes 
following changes.
    
    The failing TestFTP.basicFileList has been added after I worked on this PR. 
It uses FakeFTPServer, which provides timestamp precision in minutes. Then this 
PR adds time precision auto detection by default. The file which was expected 
to be picked was not picked because it hadn't passed the required amount of lag 
time for minute precision. The test has been updated to use millisecond 
precision explicitly and also thread sleep has been added. The same error was 
confirmed in my environment, but it's been addressed.
    
    Similarly, I found that TestAbstractListProcessor tests can fail due to 
luck of time unit setting, when generated timestamp does not have the desired 
time unit value, e.g. generated '10:51:00' where second precision is tested. 
This has been addressed, too.
    
    Finally, the reason for changing junit dependency is 
ListProcessorTestWatcher. It resides in `nifi-processor-utils` and is used by 
the project and also `nifi-standard-processors` project. In order to share 
ListProcessorTestWatcher via nifi-processor-utils, I changed junit scope to 
'compile' because it needs to be accessible from the 'main' source, not by 
'test' source. But 'provided' is more reasonable in this case, so I've updated 
it to 'provided'. Without having a mechanism like ListProcessorTestWatcher, 
debugging test failures will be very difficult, especially if it happens 
occasionally in a remote environment such as Travis CI.


> Bug in ListXXX causes matching timestamps to be ignored on later runs
> ---------------------------------------------------------------------
>
>                 Key: NIFI-3332
>                 URL: https://issues.apache.org/jira/browse/NIFI-3332
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 0.7.1, 1.1.1
>            Reporter: Joe Skora
>            Assignee: Koji Kawamura
>            Priority: Critical
>         Attachments: listfiles.png, Test-showing-ListFile-timestamp-bug.log, 
> Test-showing-ListFile-timestamp-bug.patch
>
>
> The new state implementation for the ListXXX processors based on 
> AbstractListProcessor creates a race conditions when processor runs occur 
> while a batch of files is being written with the same timestamp.
> The changes to state management dropped tracking of the files processed for a 
> given timestamp.  Without the record of files processed, the remainder of the 
> batch is ignored on the next processor run since their timestamp is not 
> greater than the one timestamp stored in processor state.  With the file 
> tracking it was possible to process files that matched the timestamp exactly 
> and exclude the previously processed files.
> A basic time goes as follows.
>   T0 - system creates or receives batch of files with Tx timestamp where Tx 
> is more than the current timestamp in processor state.
>   T1 - system writes 1st half of Tx batch to the ListFile source directory.
>   T2 - ListFile runs picking up 1st half of Tx batch and stores Tx timestamp 
> in processor state.
>   T3 - system writes 2nd half of Tx batch to ListFile source directory.
>   T4 - ListFile runs ignoring any files with T <= Tx, eliminating 2nd half Tx 
> timestamp batch.
> I've attached a patch[1] for TestListFile.java that adds an instrumented unit 
> test demonstrates the problem and a log[2] of the output from one such run.  
> The test writes 3 files each in two batches with processor runs after each 
> batch.  Batch 2 writes files with timestamps older than, equal to, and newer 
> than the timestamp stored when batch 1 was processed, but only the newer file 
> is picked up.  The older file is correctly ignored but file with the matchin 
> timestamp file should have been processed.
> [1] Test-showing-ListFile-timestamp-bug.patch
> [2] Test-showing-ListFile-timestamp-bug.log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to