Re: ReplaceText Flow File Processing Count

Joe Witt Fri, 04 May 2018 06:19:18 -0700

Bryan's guess on the history is probably right but more to the point
with what we have available these days with the record processors and
so on I think we should just change it back to one.  Peter's statement
on user expectation I agree with for sure.  Any chance you want to
file that JIRA/PR peter?


On Fri, May 4, 2018 at 9:13 AM, Bryan Bende <[email protected]> wrote:
> I don't know the history of this particular processor, but I think the
> purpose of the session.get() with batches is similar to the concept of
> @SupportsBatching. Basically both of them should have better
> performance because you are handling multiple flow files in a single
> session. The supports batching concept is a bit more flexible as it is
> configurable by the user, where as this case is hard-coded into the
> processor.
>
> I suppose if there is some reason why you need to process 1 flow file
> at a time, you could set the back-pressure threshold to 1 on the queue
> leading into ReplaceText.
>
> On Fri, May 4, 2018 at 3:50 AM, Peter Wicks (pwicks) <[email protected]> 
> wrote:
>> Had a user notice today that a ReplaceText processor, scheduled to run every 
>> 20 minutes, had processed all 14 files in queue at once. I looked at the 
>> code and see that ReplaceText does not do a standard session.get, but 
>> instead calls:
>>
>> final List<FlowFile> flowFiles = 
>> session.get(FlowFileFilters.newSizeBasedFilter(1, DataUnit.MB, 100));
>>
>> Was there a design reason behind this? To us it was just really confusing 
>> that we didn't have full control over how quickly FlowFile's move through 
>> this processor.
>>
>> Thanks,
>>   Peter

Re: ReplaceText Flow File Processing Count

Reply via email to