[ https://issues.apache.org/jira/browse/NIFI-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Payne reopened NIFI-8344: ------------------------------ Re-opening because I encountered one more corner case: * The file has rolled over * We are tailing that rolled over file * We encounter at least one full line * And then we encounter a NUL byte In that case, the {{readLines}} method will have updated the Checksum (because it found the newline), and then after that it will have thrown {{NulCharacterEncounteredException}}. As a result, {{tailRolledFile}} will remove the FlowFiles that was created and re-throw the Exception. The next time it runs, the position will be set back to the same place as it previously was, but the Checksum will already have been updated. > Allow TailFile to continue tailing a file for some time after it has been > rolled over > ------------------------------------------------------------------------------------- > > Key: NIFI-8344 > URL: https://issues.apache.org/jira/browse/NIFI-8344 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Major > Fix For: 1.14.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > TailFile makes the assumption that once a file has been rolled over, it will > never be appended to. If the file's Last Modified timestamp changes, the > processor assumes that it's a new file and imports the entire contents of the > file again. > However, one practice that I've encountered is that users have a syslog > server that rotates periodically. To rotate, they rename the existing file, > and then restart the server. When that happens, the server will flush out any > data that it has buffered to the file that was just rolled over, and then > begin writing to the new file. > This results in the TailFile processor ingesting the entire file that has > been rolled over. Because we can't keep state about every file that is rolled > over, we should introduce a property that allows the user to indicate that > upon rollover they want to continue tailing that rolled over file until it is > no longer being written to, and then begin tailing the new file. -- This message was sent by Atlassian Jira (v8.3.4#803005)