Re: NiFi Queue Monitoring

2021-07-21 Thread u...@moosheimer.com
Scott

Check out this from Pierre: https://pierrevillard.com/tag/reporting-task/

We monitor all parameters from NiFi via Reporting Tasks. We send all parameters 
via MQTT to InfluxDB and monitor that via Grafana. 
There we can then start alerts when the levels reach a critical value.

If that doesn't help you, then describe your problem specifically. Maybe we can 
help you.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 22.07.2021 um 00:04 schrieb Joe Witt :
> 
> 
> Scott
> 
> Nifi supports both push and pull. Push via reporting tasks and pull via rest 
> api.
> 
> Are you needing a particular impl of a reporting task?
> 
> You are right this is a common need.  Solved using one of these methods.
> 
> Thanks
> 
>> On Wed, Jul 21, 2021 at 2:58 PM scott  wrote:
>> Great comments all. I agree with the architecture comment about push 
>> monitoring. I've been monitoring applications for more than 2 decades now, 
>> but sometimes you have to work around the limitations of the situation. It 
>> would be really nice if NiFi had this logic built-in, and frankly I'm 
>> surprised it is not yet. I can't be the only one who has had to deal with 
>> queues filling up, causing problems downstream. NiFi certainly knows that 
>> the queues fill up, they change color and execute back-pressure logic. If it 
>> would just do something simple like write a log/error message to a log file 
>> when this happens, I would be good. 
>> I have looked at the new metrics and reporting tasks but still haven't found 
>> the right thing to do to get notified when any queue in my instance fills 
>> up. Are there any examples of using them for a similar task you can share?
>> 
>> Thanks,
>> Scott
>> 
>>> On Wed, Jul 21, 2021 at 11:29 AM u...@moosheimer.com  
>>> wrote:
>>> In general, it is a bad architecture to do monitoring via pull request. You 
>>> should always push. I recommend a look at the book "The Art of Monitoring" 
>>> by James Turnbull.
>>> 
>>> I also recommend the very good articles by Pierre Villard on the subject of 
>>> NiFi monitoring at 
>>> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/.
>>> 
>>> Hope this helps.
>>> 
>>> Mit freundlichen Grüßen / best regards
>>> Kay-Uwe Moosheimer
>>> 
>>>>> Am 21.07.2021 um 16:45 schrieb Andrew Grande :
>>>>> 
>>>> 
>>>> Can't you leverage some of the recent nifi features and basically run sql 
>>>> queries over NiFi metrics directly as part of the flow? Then act on it 
>>>> with a full flexibility of the flow. Kinda like a push design.
>>>> 
>>>> Andrew
>>>> 
>>>>> On Tue, Jul 20, 2021, 2:31 PM scott  wrote:
>>>>> Hi all,
>>>>> I'm trying to setup some monitoring of all queues in my NiFi instance, to 
>>>>> catch before queues become full. One solution I am looking at is to use 
>>>>> the API, but because I have a secure NiFi that uses LDAP, it seems to 
>>>>> require a token that expires in 24 hours or so. I need this to be an 
>>>>> automated solution, so that is not going to work. Has anyone else tackled 
>>>>> this problem with a secure LDAP enabled cluster? 
>>>>> 
>>>>> Thanks,
>>>>> Scott


Re: NiFi Queue Monitoring

2021-07-21 Thread u...@moosheimer.com
In general, it is a bad architecture to do monitoring via pull request. You 
should always push. I recommend a look at the book "The Art of Monitoring" by 
James Turnbull.

I also recommend the very good articles by Pierre Villard on the subject of 
NiFi monitoring at 
https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/.

Hope this helps.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 21.07.2021 um 16:45 schrieb Andrew Grande :
> 
> 
> Can't you leverage some of the recent nifi features and basically run sql 
> queries over NiFi metrics directly as part of the flow? Then act on it with a 
> full flexibility of the flow. Kinda like a push design.
> 
> Andrew
> 
>> On Tue, Jul 20, 2021, 2:31 PM scott  wrote:
>> Hi all,
>> I'm trying to setup some monitoring of all queues in my NiFi instance, to 
>> catch before queues become full. One solution I am looking at is to use the 
>> API, but because I have a secure NiFi that uses LDAP, it seems to require a 
>> token that expires in 24 hours or so. I need this to be an automated 
>> solution, so that is not going to work. Has anyone else tackled this problem 
>> with a secure LDAP enabled cluster? 
>> 
>> Thanks,
>> Scott


Re: Determine cluster primary node using REST API

2021-01-25 Thread u...@moosheimer.com
James,

This is generally a bad practice to do it this way.
You should use a reverse proxy that also acts as a load balancer.
This way you only have to address a single address:port and always have a 
connect (as long as at least one node is up).
Pierre Villard has written a good blogpost about this: 
https://pierrevillard.com/2017/02/10/haproxy-load-balancing-in-front-of-apache-nifi/

https://pierrevillard.com/2017/02/10/haproxy-load-balancing-in-front-of-apache-nifi/

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 25.01.2021 um 20:20 schrieb James McMahon :
> 
> 
> Let me follow-up: so if I understand you correctly Bryan, your point is that 
> I should direct the curl to a specific IP (or node name), and that will 
> guarantee that ListenHttp generates only a single flowfile. Great... I agree. 
> 
> But I’d still need to verify the destination node was active and healthy to 
> ensure my hourly curl doesn’t just fail, wouldn’t I? I figured the easiest 
> way to do that was always direct to the Primary node because the cluster 
> always has to have a healthy primary node, else the cluster isn’t much use to 
> us. Rather than hard-code a node address that may or may not be part of the 
> cluster at a given time, I want to do a curl of a nifi RESTful API that 
> returns to me all the current nodes. 
> 
> I’ll then use jq or something along those lines to grab the address of the 
> node that has value PRIMARY for key named role. I think the 
> controller/cluster api returns a monstrous JSON object with that a few layers 
> deep. I’ve never done any of this, and was hoping to avoid reinventing the 
> wheel if someone already had done it. Anybody have an example where you’ve 
> cherry picked a value from the JSON returned by controller/cluster to get the 
> node address for the Primary node?
> 
> 
>> On Mon, Jan 25, 2021 at 1:59 PM Bryan Bende  wrote:
>> It makes sense to only run the check on one node, but it shouldn't
>> matter which node. Whatever is making the request to ListenHTTP
>> (sounds like curl), can send to any node, as long it only sends to one
>> of them, then you only go through the check once and get one email.
>> 
>> The REST API for getting the cluster info is under /controller/cluster
>> 
>> https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
>> 
>> On Mon, Jan 25, 2021 at 1:45 PM James McMahon  wrote:
>> >
>> > Hello Bryan. We run on Primary only because we are doing an end-to-end 
>> > verification that our pipeline is available at the top of each hour, 
>> > across several nifi links in a lengthy processing chain. We only want that 
>> > done through one Node, not all N nodes in the cluster. It generates an 
>> > alert email to me and others each hour, and we don’t need N email alerts. 
>> > Let me know if you have any other questions.
>> >
>> > Can you provide me with an example of the Rest API call in bash via curl 
>> > where you parse the primary node out of the returned JSON structure?
>> >
>> > On Mon, Jan 25, 2021 at 1:37 PM Bryan Bende  wrote:
>> >>
>> >> I know this doesn't really answer your question, but is there a reason
>> >> you are setting ListenHttp to run on primary node only and not all
>> >> nodes?
>> >>
>> >> Typically you'd use "primary node only" for a source processing that
>> >> is pulling data from somewhere and you only want it to happen once,
>> >> otherwise you'd pull the same data multiple times. In this case,
>> >> ListenHTTP is just going to be sitting there waiting for something to
>> >> send data to it, so why not listen on all nodes?
>> >>
>> >> The processor is going to be started on all nodes anyway, so the
>> >> embedded Jetty is already started and listening on all nodes, the
>> >> "Primary Node Only" just means the onTrigger method will only be
>> >> called for the processor on the primary node, so for ListenHTTP that
>> >> just means it will only process the requests on the primary.
>> >>
>> >> On Sun, Jan 24, 2021 at 6:49 PM James McMahon  
>> >> wrote:
>> >> >
>> >> > I have a NiFi cluster, nifi version 1.8.n. I need to use curl from a 
>> >> > bash shell script on a remote host to query for the primary node of the 
>> >> > cluster at that moment. I understand there may be a NiFi REST API call 
>> >> > I can make to do this, but have little experience integrating such a 
>> >> > call in bash. Does anyone have an example that does this?
>> >> >
>> >> > Why do I want to do this? I have a ListenHttp running as an entry point 
>> >> > in a flow on the cluster, and that processor runs in “Primary node” 
>> >> > only configuration. Since the external zookeeper can change the primary 
>> >> > at any time, I need to precede this curl call with a curl that returns 
>> >> > to me the primary node.
>> >> >
>> >> > Thanks in advance for your help.