Ok, will keep the standalone ZooKeeper in mind.

Back to the original issue, any idea why ZooKeeper went to a PENDING state
making the cluster unavailable?


On Tue, Apr 11, 2017 at 2:10 PM, Mark Payne <marka...@hotmail.com> wrote:

> Mark,
>
> Yes, 2 out of 3 should be sufficient. For testing purposes, a single
> zookeeper instance
> is fine, as well. For production, I would not actually recommend using an
> embedded
> ZooKeeper at all and instead use a standalone ZooKeeper. ZooKeeper tends
> not to be
> very happy when running on a box on which there is already heavy resource
> load, so if
> your cluster starts getting busy, you'll see far more stable performance
> from a standalone
> ZooKeeper.
>
>
> > On Apr 11, 2017, at 2:06 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
> >
> > All 3 nodes are running embedded ZooKeeper. And, the Admin Guide states
> > "ZooKeeper requires a majority of nodes be active in order to function".
> > So, I assumed 2/3 being active was ok. Perhaps not.
> >
> > Related: can a Cluster be setup with only 1 ZooKeeper node? Clearly, in
> > production, one would not want to do this. But when testing, this should
> be
> > acceptable, yes?
> >
> >
> >
> > On Tue, Apr 11, 2017 at 1:56 PM, Mark Payne <marka...@hotmail.com>
> wrote:
> >
> >> Mark,
> >>
> >> Are all of your nodes running an embedded ZooKeeper, or only 1 or 2 of
> >> them?
> >>
> >> Thanks
> >> -Mark
> >>
> >>> On Apr 11, 2017, at 1:19 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
> >>>
> >>> I have a 3-node Cluster with each Node hosting the embedded zookeeper.
> >> When
> >>> one Node is shutdown (and the Node is not the Cluster Coordinator), the
> >>> Cluster becomes unavailable. The UI indicates "Action cannot be
> performed
> >>> because there is currently no Cluster Coordinator elected. The request
> >>> should be tried again after a moment, after a Cluster Coordinator has
> >> been
> >>> automatically elected."
> >>>
> >>> The app.log indicates "ConnectionStateManager State change: SUSPENDED".
> >>> And, there are an endless number of "CuratorFrameworkImpl Background
> >> retry
> >>> gave up" messages; the surviving Nodes are not able to allow the
> Cluster
> >> to
> >>> survive.
> >>>
> >>> I would have thought since 2/3 Nodes are surviving, there wouldn't be a
> >>> problem. In addition, since the Node that was shutdown was not the
> >> Cluster
> >>> Coordinator nor Primary node, no Cluster state changes were required.
> >>>
> >>> nifi.cluster.flow.election.max.wait.time=2 mins
> >>> nifi.cluster.flow.election.max.candidates=
> >>>
> >>> The same behavior was observed when max.candidates was set to 2.
> >>>
> >>> NiFi 1.1.2
> >>
> >>
>
>

Reply via email to