[ 
https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982563#comment-14982563
 ] 

Mark Payne commented on NIFI-994:
---------------------------------

[~jskora] - I totally understand you're not being argumentative - the problem 
with online communication is that talking through scenarios often does feel 
argumentative. But I think I know you well enough to know you're more 
interested in making NiFi awesome than in arguing your ideas :)

I agree with the logic that you've laid out here. It won't be guaranteed 
against every possible corner case. However for the 99.9% use case, it should 
get all of the data. 99.9% of the time, Scenario 2 I don't think is going to 
happen - if the producer is just trashing its own data, well... not much we can 
do :) And I think Scenario #4 is possible but *extremely* rare, especially for 
a logging case, that you would replace an entire log file in a tiny amount of 
time with more content than was in the previous log file. Possible but rare.

The checksum really serves only one purpose, as it is implemented now. If the 
Processor (or NiFi) is stopped for a while, we need to know where we left off. 
Since the filename will have changed if the log rolled over, we need to figure 
out which file it was that is already half-consumed so that we don't re-consume 
the first half.

I expect that this Processor will undergo some iteration in the future as it is 
field-tested, and we'll make it much better over time. As simple as the 
description of this processor sounds, it's really complicated with all the 
weird edge cases that you run into when consuming data that keeps changing with 
no unique identifier :(

> Processor to tail files
> -----------------------
>
>                 Key: NIFI-994
>                 URL: https://issues.apache.org/jira/browse/NIFI-994
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Joseph Percivall
>            Assignee: Mark Payne
>             Fix For: 0.4.0
>
>         Attachments: 0001-NIFI-994-Initial-import-of-TailFile.patch, 
> 0002-NIFI-994-Ensure-that-processor-is-not-valid-due-to-t.patch
>
>
> It's a very common data ingest situation to want to input text into the 
> system by "tailing" a file, most commonly log files. Currently we don't have 
> an easy way to do this. 
> A simple processor to tail a file would benefit many users. There would need 
> to be an option to not just tail a file but pick up where the processor left 
> off if it is interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to