Long recovery time

2017-10-08 Thread Joe Gresock
I have a a NiFi 1.1.0 instance whose disk nearly (but not quite) filled
up.  I noticed that some of its NiFi processors were hanging so I restarted
it, but it's taking over an hour to come back up.

My question is: how can I tell if NiFi is doing something productive (and
therefore I should just let it finish) vs. hanging (and therefore I should
try something else)?  Is it possible that NiFi could take hours to stand
back up?  My content_repository is 276GB and my flowfile_repository is
640GB.

I see the following in the logs:

o.a.n.controller.StandardFlowFileQueue Recovered 8 swap files for
FlowFileQueue[...] in 51 millis
org.wali.MinimalLockingWriteAheadLog finished recovering records.
Performing Checkpoint to ensure proper state of Partitions before updates
org.wali.MinimalLockingWriteAheadLog Successfully recovered 10536141
records in 38509 milliseconds

Thereafter, the only thing I see in the logs are these periodic messages:
org.wali.MinimalLockingWriteAheadLog checkpointed with 2 Records and 0 Swap
Files in 16 milliseconds, Max Transaction ID 31

I did a thread dump and see pretty standard stuff, including one that I
thought might be relevant:
"main" Id=1 RUNNABLE
  at java.util.HashMap.putVal(HashMap.java:641)
  at java.util.HashMap.put(HashMap.java:611)
  at
org.apache.nifi.repository.schema.SchemaRecordReader.readFieldValue(SchemaRecordReader.java:154)

I took a couple dumps in a row in case it was hung here, but it appears to
be progressing to different points in the stack.

NiFi is the only thing running on this instance, and nearly all of its 48GB
of RAM are being used, and I did notice that it is doing some heavy reads
but not many writes (using iostat).

Thanks,
Joe
-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: Long recovery time

2017-10-08 Thread Mark Payne
Joe,

When you have a huge FlowFile repository like that, it can indeed take quite a 
long time to recover (potentially a few hours).
But I say that with the caveat that the FlowFile repository should probably 
never reach that size.
It generally doesn't grow beyond a couple of GB. The two things that I have 
seen cause tremendous
growth in the FlowFile repository are OutOfMemoryError and Too Many Open File 
IOExceptions. Either could prevent
the FlowFile Repository from properly checkpointing, which would cause it to 
grow unbounded. The
former case (OOME), though, is far more likely. The reason that I say this is 
that if any OOME (or any uncaught Throwable)
is thrown when checkpointing the FlowFile repository, that background thread 
will die. I did submit a fix for that,
and it was included in 1.4.0, I believe (ASF JIRA is having maintenance 
performed at the moment I believe, so I'm
not able to look it up at the moment).

There probably are some things that can be done to address the super long 
recovery time, though. The problem should
be easy to replicate by telling NiFi to checkpoint the flowfile repository once 
an hour instead of the default once every 2
minutes (via the 'nifi.flowfile.repository.checkpoint.interval' property) so 
that would allow us to know what is taking so
long and address accordingly.


Thanks
-Mark


> On Oct 8, 2017, at 11:53 AM, Joe Gresock  wrote:
> 
> I have a a NiFi 1.1.0 instance whose disk nearly (but not quite) filled
> up.  I noticed that some of its NiFi processors were hanging so I restarted
> it, but it's taking over an hour to come back up.
> 
> My question is: how can I tell if NiFi is doing something productive (and
> therefore I should just let it finish) vs. hanging (and therefore I should
> try something else)?  Is it possible that NiFi could take hours to stand
> back up?  My content_repository is 276GB and my flowfile_repository is
> 640GB.
> 
> I see the following in the logs:
> 
> o.a.n.controller.StandardFlowFileQueue Recovered 8 swap files for
> FlowFileQueue[...] in 51 millis
> org.wali.MinimalLockingWriteAheadLog finished recovering records.
> Performing Checkpoint to ensure proper state of Partitions before updates
> org.wali.MinimalLockingWriteAheadLog Successfully recovered 10536141
> records in 38509 milliseconds
> 
> Thereafter, the only thing I see in the logs are these periodic messages:
> org.wali.MinimalLockingWriteAheadLog checkpointed with 2 Records and 0 Swap
> Files in 16 milliseconds, Max Transaction ID 31
> 
> I did a thread dump and see pretty standard stuff, including one that I
> thought might be relevant:
> "main" Id=1 RUNNABLE
>  at java.util.HashMap.putVal(HashMap.java:641)
>  at java.util.HashMap.put(HashMap.java:611)
>  at
> org.apache.nifi.repository.schema.SchemaRecordReader.readFieldValue(SchemaRecordReader.java:154)
> 
> I took a couple dumps in a row in case it was hung here, but it appears to
> be progressing to different points in the stack.
> 
> NiFi is the only thing running on this instance, and nearly all of its 48GB
> of RAM are being used, and I did notice that it is doing some heavy reads
> but not many writes (using iostat).
> 
> Thanks,
> Joe
> -- 
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*



route flow based on variable

2017-10-08 Thread 尹文才
Hi guys, I've played around with the latest NIFI 1.4.0 release for a while
and I think the new variable registry feature is great, however I have 2
questions about this feature:

1. It seems that I could only add variables to a processor group, could I
add a global variable in the NIFI root processor group so it could be used
anywhere inside NIFI?

2. I want to route to different flows based on the variables I added, but
currently the only way I know that could make this work is like this:
myprocessor->updateAttribute(add the variable into FlowFile attribute)->
routeOnAttribute->different flows based on the variable value

I didn't find any routeOnVariable processor, is there any easier way that I
could use to implement conditional flow in NIFI? Thanks

/Ben


Funnel Queue Slowness

2017-10-08 Thread Peter Wicks (pwicks)
I've been running into an issue on 1.4.0 where my Funnel sometimes runs slow. I 
haven't been able to create a nice reproducible test case to pass on.
What I'm seeing is that my failure queue on the right will start to fill up, 
even though there is plenty of room for them in the next queue. You can see 
that the Tasks/Time is fairly low, only 24 in the last 5 minutes (first image), 
so it's not that the FlowFile's are moving so fast that they just appear to be 
in queue.

If I stop the downstream processor the files slowly trickle through the funnel 
into the next queue slowly. I had an Oldest FlowFile First prioritizer on the 
downstream queue. I tried removing it but there was no change in behavior.
One time where I saw this behavior in the past was when my NiFi instance was 
thread starved, but there are plenty of threads available on the instance and 
all other processors are running fine. I also don't understand why it trickles 
the FlowFile's in, from what I've seen in the code Funnel grabs large batches 
at one time...

Thoughts?

(Sometimes my images don't make it, let me know if that happens.)
[cid:image002.png@01D340EC.543FE750] [cid:image004.png@01D340EC.543FE750]