Excellent! Glad it's all working now. And thanks for the follow-up to let us know!
-Mark > On Apr 12, 2017, at 11:30 AM, Mark Bean <mark.o.b...@gmail.com> wrote: > > Mark, > > I believe you're right. Yesterday, I corrected a typo in the > nifi.properties file related to the FQDN name. I thought it was only the > site-to-site property (nifi.remote.input.host). However, when I > intentionally introduced a typo to one of the three ZK servers in the > nifi.zookeeper.connect.string today, I was able to reproduce the symptoms. > I'm sure that must have been it. Without the typo, all is working well. > > Thanks, > Mark > > On Wed, Apr 12, 2017 at 10:36 AM, Mark Payne <marka...@hotmail.com> wrote: > >> Mark, >> >> I haven't seen this behavior personally, so I can't be sure why exactly it >> would change state >> to SUSPENDED and not then re-connect. In your nifi.properties, do you have >> the >> "nifi.zookeeper.connect.string" property setup to point to all 3 of the >> nodes, also? If so, it should >> be able to connect to one of the other two nodes listed. >> >> Thanks >> -Mark >> >>> On Apr 11, 2017, at 2:37 PM, Mark Bean <mark.o.b...@gmail.com> wrote: >>> >>> Ok, will keep the standalone ZooKeeper in mind. >>> >>> Back to the original issue, any idea why ZooKeeper went to a PENDING >> state >>> making the cluster unavailable? >>> >>> >>> On Tue, Apr 11, 2017 at 2:10 PM, Mark Payne <marka...@hotmail.com> >> wrote: >>> >>>> Mark, >>>> >>>> Yes, 2 out of 3 should be sufficient. For testing purposes, a single >>>> zookeeper instance >>>> is fine, as well. For production, I would not actually recommend using >> an >>>> embedded >>>> ZooKeeper at all and instead use a standalone ZooKeeper. ZooKeeper tends >>>> not to be >>>> very happy when running on a box on which there is already heavy >> resource >>>> load, so if >>>> your cluster starts getting busy, you'll see far more stable performance >>>> from a standalone >>>> ZooKeeper. >>>> >>>> >>>>> On Apr 11, 2017, at 2:06 PM, Mark Bean <mark.o.b...@gmail.com> wrote: >>>>> >>>>> All 3 nodes are running embedded ZooKeeper. And, the Admin Guide states >>>>> "ZooKeeper requires a majority of nodes be active in order to >> function". >>>>> So, I assumed 2/3 being active was ok. Perhaps not. >>>>> >>>>> Related: can a Cluster be setup with only 1 ZooKeeper node? Clearly, in >>>>> production, one would not want to do this. But when testing, this >> should >>>> be >>>>> acceptable, yes? >>>>> >>>>> >>>>> >>>>> On Tue, Apr 11, 2017 at 1:56 PM, Mark Payne <marka...@hotmail.com> >>>> wrote: >>>>> >>>>>> Mark, >>>>>> >>>>>> Are all of your nodes running an embedded ZooKeeper, or only 1 or 2 of >>>>>> them? >>>>>> >>>>>> Thanks >>>>>> -Mark >>>>>> >>>>>>> On Apr 11, 2017, at 1:19 PM, Mark Bean <mark.o.b...@gmail.com> >> wrote: >>>>>>> >>>>>>> I have a 3-node Cluster with each Node hosting the embedded >> zookeeper. >>>>>> When >>>>>>> one Node is shutdown (and the Node is not the Cluster Coordinator), >> the >>>>>>> Cluster becomes unavailable. The UI indicates "Action cannot be >>>> performed >>>>>>> because there is currently no Cluster Coordinator elected. The >> request >>>>>>> should be tried again after a moment, after a Cluster Coordinator has >>>>>> been >>>>>>> automatically elected." >>>>>>> >>>>>>> The app.log indicates "ConnectionStateManager State change: >> SUSPENDED". >>>>>>> And, there are an endless number of "CuratorFrameworkImpl Background >>>>>> retry >>>>>>> gave up" messages; the surviving Nodes are not able to allow the >>>> Cluster >>>>>> to >>>>>>> survive. >>>>>>> >>>>>>> I would have thought since 2/3 Nodes are surviving, there wouldn't >> be a >>>>>>> problem. In addition, since the Node that was shutdown was not the >>>>>> Cluster >>>>>>> Coordinator nor Primary node, no Cluster state changes were required. >>>>>>> >>>>>>> nifi.cluster.flow.election.max.wait.time=2 mins >>>>>>> nifi.cluster.flow.election.max.candidates= >>>>>>> >>>>>>> The same behavior was observed when max.candidates was set to 2. >>>>>>> >>>>>>> NiFi 1.1.2 >>>>>> >>>>>> >>>> >>>> >> >>