Mark,

I believe you're right. Yesterday, I corrected a typo in the
nifi.properties file related to the FQDN name. I thought it was only the
site-to-site property (nifi.remote.input.host). However, when I
intentionally introduced a typo to one of the three ZK servers in the
nifi.zookeeper.connect.string today, I was able to reproduce the symptoms.
I'm sure that must have been it. Without the typo, all is working well.

Thanks,
Mark

On Wed, Apr 12, 2017 at 10:36 AM, Mark Payne <marka...@hotmail.com> wrote:

> Mark,
>
> I haven't seen this behavior personally, so I can't be sure why exactly it
> would change state
> to SUSPENDED and not then re-connect. In your nifi.properties, do you have
> the
> "nifi.zookeeper.connect.string" property setup to point to all 3 of the
> nodes, also? If so, it should
> be able to connect to one of the other two nodes listed.
>
> Thanks
> -Mark
>
> > On Apr 11, 2017, at 2:37 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
> >
> > Ok, will keep the standalone ZooKeeper in mind.
> >
> > Back to the original issue, any idea why ZooKeeper went to a PENDING
> state
> > making the cluster unavailable?
> >
> >
> > On Tue, Apr 11, 2017 at 2:10 PM, Mark Payne <marka...@hotmail.com>
> wrote:
> >
> >> Mark,
> >>
> >> Yes, 2 out of 3 should be sufficient. For testing purposes, a single
> >> zookeeper instance
> >> is fine, as well. For production, I would not actually recommend using
> an
> >> embedded
> >> ZooKeeper at all and instead use a standalone ZooKeeper. ZooKeeper tends
> >> not to be
> >> very happy when running on a box on which there is already heavy
> resource
> >> load, so if
> >> your cluster starts getting busy, you'll see far more stable performance
> >> from a standalone
> >> ZooKeeper.
> >>
> >>
> >>> On Apr 11, 2017, at 2:06 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
> >>>
> >>> All 3 nodes are running embedded ZooKeeper. And, the Admin Guide states
> >>> "ZooKeeper requires a majority of nodes be active in order to
> function".
> >>> So, I assumed 2/3 being active was ok. Perhaps not.
> >>>
> >>> Related: can a Cluster be setup with only 1 ZooKeeper node? Clearly, in
> >>> production, one would not want to do this. But when testing, this
> should
> >> be
> >>> acceptable, yes?
> >>>
> >>>
> >>>
> >>> On Tue, Apr 11, 2017 at 1:56 PM, Mark Payne <marka...@hotmail.com>
> >> wrote:
> >>>
> >>>> Mark,
> >>>>
> >>>> Are all of your nodes running an embedded ZooKeeper, or only 1 or 2 of
> >>>> them?
> >>>>
> >>>> Thanks
> >>>> -Mark
> >>>>
> >>>>> On Apr 11, 2017, at 1:19 PM, Mark Bean <mark.o.b...@gmail.com>
> wrote:
> >>>>>
> >>>>> I have a 3-node Cluster with each Node hosting the embedded
> zookeeper.
> >>>> When
> >>>>> one Node is shutdown (and the Node is not the Cluster Coordinator),
> the
> >>>>> Cluster becomes unavailable. The UI indicates "Action cannot be
> >> performed
> >>>>> because there is currently no Cluster Coordinator elected. The
> request
> >>>>> should be tried again after a moment, after a Cluster Coordinator has
> >>>> been
> >>>>> automatically elected."
> >>>>>
> >>>>> The app.log indicates "ConnectionStateManager State change:
> SUSPENDED".
> >>>>> And, there are an endless number of "CuratorFrameworkImpl Background
> >>>> retry
> >>>>> gave up" messages; the surviving Nodes are not able to allow the
> >> Cluster
> >>>> to
> >>>>> survive.
> >>>>>
> >>>>> I would have thought since 2/3 Nodes are surviving, there wouldn't
> be a
> >>>>> problem. In addition, since the Node that was shutdown was not the
> >>>> Cluster
> >>>>> Coordinator nor Primary node, no Cluster state changes were required.
> >>>>>
> >>>>> nifi.cluster.flow.election.max.wait.time=2 mins
> >>>>> nifi.cluster.flow.election.max.candidates=
> >>>>>
> >>>>> The same behavior was observed when max.candidates was set to 2.
> >>>>>
> >>>>> NiFi 1.1.2
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to