Re: [jira] [Commented] (NIFI-994) Processor to tail files

Mark Payne Wed, 30 Sep 2015 08:17:16 -0700

Joe,

The problem with "tail -F" is that if NiFi is restarted and then we do 
essentially "tail -F"
we may have missed a lot of data that was written to the log file while NiFi 
was down.
The idea behind this Processor is to be able to recover that data, even if it 
was written
to a log file (or any other sort of file) while NiFi was not running or while 
the Processor
was not running.


I agree that it should be "chunk oriented" - likely would need a property that 
indicates how
long to tail for a single chunk. E.g., tail for 1 second and create a FlowFile 
with the content
received.

-Mark


> On Sep 30, 2015, at 11:03 AM, Joe Skora <jsk...@gmail.com> wrote:
> 
> For a NiFi processor, I think the "tail -F" makes more sense.  As opposed
> to the normal behavior that follows existing file descriptors, "tail -F"
> follows on filename (or pattern) so it tracks the current instance of a
> file, letting it handle new files during the run, log rotations, etc..
> 
> I definitely agree that it should take a regex or a fixed filename.
> 
> I think the biggest question is granularity.  Though tail is normally a
> line oriented operation, in NiFi it should probably be "chunk" oriented
> with each pass creating a new flow file with whatever new full lines are
> available.
> 
> On Wed, Sep 30, 2015 at 10:23 AM, Mark Payne (JIRA) <j...@apache.org> wrote:
> 
>> 
>>    [
>> https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936888#comment-14936888
>> ]
>> 
>> Mark Payne commented on NIFI-994:
>> ---------------------------------
>> 
>> Agreed. I'd recommend we allow the filename to tail to contain a * so that
>> as things roll over we can still process the data. We could sort on last
>> modified time to know the ordering of the files, and if we keep an offset
>> into a file plus the timestamp when we pulled that file, that should help
>> us to know which file it came from (the one with the smallest Last Modified
>> timestamp >= our timestamp) and then we know which offset we left off at.
>> 
>> If the data rolls off then you're right - there's nothing we can do about
>> that. Would recommend we mention in the @CapabilityDescription that we
>> expect logs to be kept around long enough to recover from outages.
>> 
>> 
>>> Processor to tail files
>>> -----------------------
>>> 
>>>                Key: NIFI-994
>>>                URL: https://issues.apache.org/jira/browse/NIFI-994
>>>            Project: Apache NiFi
>>>         Issue Type: New Feature
>>>   Affects Versions: 0.4.0
>>>           Reporter: Joseph Percivall
>>>           Assignee: Joseph Percivall
>>> 
>>> It's a very common data ingest situation to want to input text into the
>> system by "tailing" a file, most commonly log files. Currently we don't
>> have an easy way to do this.
>>> A simple processor to tail a file would benefit many users. There would
>> need to be an option to not just tail a file but pick up where the
>> processor left off if it is interrupted.
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>

Re: [jira] [Commented] (NIFI-994) Processor to tail files

Reply via email to