Ok I didn't realize you had already tried setting the back-pressure
settings. Can you described the processors a little more, are they custom
processors?

I am guessing that ProcessorA is producing all 5k flow files from a single
execution of onTrigger, which would explain why back-pressure didn't solve
the problem, because
back-pressure would stop the processor from executing again, but its
already too late because the first execution already went over the limit.

Without knowing too much about what ProcessorA is doing, I'm wondering  if
there is a way to put some indirection between the two processors. What if
ProcessorA sent its
output to a PutFile processor that wrote all the chunks out to a directory,
then there was a separate GetFile processor that was concurrently picking
up the chunks from that
directory and sending to ProcessorB?

Then the back-pressure between GetFile and ProcessorB would work because
once the queue reached 2000, GetFile wouldn't pick up anymore files. The
downside is you
would need enough disk-space on your NiFi node to possibly store your whole
database table, which may not be an option.

Another idea might be to have two levels of chunks, for example with the
SplitText processor if we want to split a file with 1 million lines in it,
rather than do one split producing
1 million flow files, we usually do a split to 10k chunks, then another
split down to 1 line. Maybe ProcessorA could produce much large chunks, say
10k or 100k records each,
then the next processor further splits those before going to ProcessorB.
This would also allow back-pressure to work a little better the second
split processor and ProcessorB.

If anyone else has ideas here, feel free to chime in.

Thanks,

Bryan

On Wed, Jun 8, 2016 at 10:51 AM, Shaine Berube <
shaine.ber...@perfectsearchcorp.com> wrote:

> I do need more information, because I tried using that option, but the
> processor just continued filling the queue anyway, I told it to only allow
> 2000 before back pressure kicks in, but it kept going and I ended up with
> 5k files in the queue before I restarted Nifi to get the processor to stop.
>
> On Wed, Jun 8, 2016 at 8:45 AM, Bryan Bende <bbe...@gmail.com> wrote:
>
> > Hello,
> >
> > Take a look at the options available when right-clicking on a queue...
> > What you described is what NiFi calls back-pressure. You can configured a
> > queue to have an object threshold (# of flow files) or data size
> threshold
> > (total size of all flow files).
> > When one of these thresholds is reached, NiFi will no longer let the
> source
> > processor run until the condition goes back under the threshold.
> >
> > Let us know if you need any more info on this.
> >
> > Thanks,
> >
> > Bryan
> >
> > On Wed, Jun 8, 2016 at 10:40 AM, Shaine Berube <
> > shaine.ber...@perfectsearchcorp.com> wrote:
> >
> > > Hello all,
> > >
> > > I'm kind of new to developing Nifi, though I've been doing some pretty
> in
> > > depth stuff and some advanced database queries.  My question is in
> > > regarding the queues between processor, I want to limit a queue to...
> say
> > > 2000, how would I go about doing that?  Or better yet, how would I tell
> > the
> > > processor generating the queue to only put a max of 2000 files into the
> > > queue?
> > >
> > > Allow me to explain with a scenario:
> > > We are doing data migration from one database to another.
> > > -Processor A is generating a queue consumed by Processor B
> > > -Processor A is taking configuration and generating SQL queries in 1000
> > > record chunks so that Processor B can insert them into a new database.
> > > Given the size of the source database, Processor A can potentially
> > generate
> > > hundreds of thousands of files.
> > >
> > > Is there a way for Processor A to check it's down stream queue for the
> > > queue size?  How would I get Processor A to only put 2000 files into
> the
> > > queue at any given time, so that Processor A can continue running but
> > wait
> > > for room in the queue?
> > >
> > > Thank you in advance.
> > >
> > > --
> > > *Shaine Berube*
> > >
> >
>
>
>
> --
> *Shaine Berube*
>

Reply via email to