Mark,

I haven't seen this behavior personally, so I can't be sure why exactly it 
would change state
to SUSPENDED and not then re-connect. In your nifi.properties, do you have the 
"nifi.zookeeper.connect.string" property setup to point to all 3 of the nodes, 
also? If so, it should
be able to connect to one of the other two nodes listed.

Thanks
-Mark

> On Apr 11, 2017, at 2:37 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
> 
> Ok, will keep the standalone ZooKeeper in mind.
> 
> Back to the original issue, any idea why ZooKeeper went to a PENDING state
> making the cluster unavailable?
> 
> 
> On Tue, Apr 11, 2017 at 2:10 PM, Mark Payne <marka...@hotmail.com> wrote:
> 
>> Mark,
>> 
>> Yes, 2 out of 3 should be sufficient. For testing purposes, a single
>> zookeeper instance
>> is fine, as well. For production, I would not actually recommend using an
>> embedded
>> ZooKeeper at all and instead use a standalone ZooKeeper. ZooKeeper tends
>> not to be
>> very happy when running on a box on which there is already heavy resource
>> load, so if
>> your cluster starts getting busy, you'll see far more stable performance
>> from a standalone
>> ZooKeeper.
>> 
>> 
>>> On Apr 11, 2017, at 2:06 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
>>> 
>>> All 3 nodes are running embedded ZooKeeper. And, the Admin Guide states
>>> "ZooKeeper requires a majority of nodes be active in order to function".
>>> So, I assumed 2/3 being active was ok. Perhaps not.
>>> 
>>> Related: can a Cluster be setup with only 1 ZooKeeper node? Clearly, in
>>> production, one would not want to do this. But when testing, this should
>> be
>>> acceptable, yes?
>>> 
>>> 
>>> 
>>> On Tue, Apr 11, 2017 at 1:56 PM, Mark Payne <marka...@hotmail.com>
>> wrote:
>>> 
>>>> Mark,
>>>> 
>>>> Are all of your nodes running an embedded ZooKeeper, or only 1 or 2 of
>>>> them?
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>>> On Apr 11, 2017, at 1:19 PM, Mark Bean <mark.o.b...@gmail.com> wrote:
>>>>> 
>>>>> I have a 3-node Cluster with each Node hosting the embedded zookeeper.
>>>> When
>>>>> one Node is shutdown (and the Node is not the Cluster Coordinator), the
>>>>> Cluster becomes unavailable. The UI indicates "Action cannot be
>> performed
>>>>> because there is currently no Cluster Coordinator elected. The request
>>>>> should be tried again after a moment, after a Cluster Coordinator has
>>>> been
>>>>> automatically elected."
>>>>> 
>>>>> The app.log indicates "ConnectionStateManager State change: SUSPENDED".
>>>>> And, there are an endless number of "CuratorFrameworkImpl Background
>>>> retry
>>>>> gave up" messages; the surviving Nodes are not able to allow the
>> Cluster
>>>> to
>>>>> survive.
>>>>> 
>>>>> I would have thought since 2/3 Nodes are surviving, there wouldn't be a
>>>>> problem. In addition, since the Node that was shutdown was not the
>>>> Cluster
>>>>> Coordinator nor Primary node, no Cluster state changes were required.
>>>>> 
>>>>> nifi.cluster.flow.election.max.wait.time=2 mins
>>>>> nifi.cluster.flow.election.max.candidates=
>>>>> 
>>>>> The same behavior was observed when max.candidates was set to 2.
>>>>> 
>>>>> NiFi 1.1.2
>>>> 
>>>> 
>> 
>> 

Reply via email to