James,
This is generally a bad practice to do it this way.
You should use a reverse proxy that also acts as a load balancer.
This way you only have to address a single address:port and always have a
connect (as long as at least one node is up).
Pierre Villard has written a good blogpost about this:
https://pierrevillard.com/2017/02/10/haproxy-load-balancing-in-front-of-apache-nifi/
https://pierrevillard.com/2017/02/10/haproxy-load-balancing-in-front-of-apache-nifi/
Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer
> Am 25.01.2021 um 20:20 schrieb James McMahon :
>
>
> Let me follow-up: so if I understand you correctly Bryan, your point is that
> I should direct the curl to a specific IP (or node name), and that will
> guarantee that ListenHttp generates only a single flowfile. Great... I agree.
>
> But I’d still need to verify the destination node was active and healthy to
> ensure my hourly curl doesn’t just fail, wouldn’t I? I figured the easiest
> way to do that was always direct to the Primary node because the cluster
> always has to have a healthy primary node, else the cluster isn’t much use to
> us. Rather than hard-code a node address that may or may not be part of the
> cluster at a given time, I want to do a curl of a nifi RESTful API that
> returns to me all the current nodes.
>
> I’ll then use jq or something along those lines to grab the address of the
> node that has value PRIMARY for key named role. I think the
> controller/cluster api returns a monstrous JSON object with that a few layers
> deep. I’ve never done any of this, and was hoping to avoid reinventing the
> wheel if someone already had done it. Anybody have an example where you’ve
> cherry picked a value from the JSON returned by controller/cluster to get the
> node address for the Primary node?
>
>
>> On Mon, Jan 25, 2021 at 1:59 PM Bryan Bende wrote:
>> It makes sense to only run the check on one node, but it shouldn't
>> matter which node. Whatever is making the request to ListenHTTP
>> (sounds like curl), can send to any node, as long it only sends to one
>> of them, then you only go through the check once and get one email.
>>
>> The REST API for getting the cluster info is under /controller/cluster
>>
>> https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
>>
>> On Mon, Jan 25, 2021 at 1:45 PM James McMahon wrote:
>> >
>> > Hello Bryan. We run on Primary only because we are doing an end-to-end
>> > verification that our pipeline is available at the top of each hour,
>> > across several nifi links in a lengthy processing chain. We only want that
>> > done through one Node, not all N nodes in the cluster. It generates an
>> > alert email to me and others each hour, and we don’t need N email alerts.
>> > Let me know if you have any other questions.
>> >
>> > Can you provide me with an example of the Rest API call in bash via curl
>> > where you parse the primary node out of the returned JSON structure?
>> >
>> > On Mon, Jan 25, 2021 at 1:37 PM Bryan Bende wrote:
>> >>
>> >> I know this doesn't really answer your question, but is there a reason
>> >> you are setting ListenHttp to run on primary node only and not all
>> >> nodes?
>> >>
>> >> Typically you'd use "primary node only" for a source processing that
>> >> is pulling data from somewhere and you only want it to happen once,
>> >> otherwise you'd pull the same data multiple times. In this case,
>> >> ListenHTTP is just going to be sitting there waiting for something to
>> >> send data to it, so why not listen on all nodes?
>> >>
>> >> The processor is going to be started on all nodes anyway, so the
>> >> embedded Jetty is already started and listening on all nodes, the
>> >> "Primary Node Only" just means the onTrigger method will only be
>> >> called for the processor on the primary node, so for ListenHTTP that
>> >> just means it will only process the requests on the primary.
>> >>
>> >> On Sun, Jan 24, 2021 at 6:49 PM James McMahon
>> >> wrote:
>> >> >
>> >> > I have a NiFi cluster, nifi version 1.8.n. I need to use curl from a
>> >> > bash shell script on a remote host to query for the primary node of the
>> >> > cluster at that moment. I understand there may be a NiFi REST API call
>> >> > I can make to do this, but have little experience integrating such a
>> >> > call in bash. Does anyone have an example that does this?
>> >> >
>> >> > Why do I want to do this? I have a ListenHttp running as an entry point
>> >> > in a flow on the cluster, and that processor runs in “Primary node”
>> >> > only configuration. Since the external zookeeper can change the primary
>> >> > at any time, I need to precede this curl call with a curl that returns
>> >> > to me the primary node.
>> >> >
>> >> > Thanks in advance for your help.