[
https://issues.apache.org/jira/browse/KAFKA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron Dagostino resolved KAFKA-14890.
-----------------------------------
Resolution: Duplicate
Duplicate of https://issues.apache.org/jira/browse/KAFKA-14887
> Kafka initiates shutdown due to connectivity problem with Zookeeper and
> FatalExitError from ChangeNotificationProcessorThread
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-14890
> URL: https://issues.apache.org/jira/browse/KAFKA-14890
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 3.3.2
> Reporter: Denis Razuvaev
> Priority: Major
>
> Hello,
> We have faced several times the deadlock in Kafka, the similar issue is -
> https://issues.apache.org/jira/browse/KAFKA-13544
> The question - is it expected behavior that Kafka decided to shut down due to
> connectivity problems with Zookeeper? Seems like it is related to the
> inability to read data from */feature* Zk node and the
> _ZooKeeperClientExpiredException_ thrown from _ZooKeeperClient_ class. This
> exception is thrown and it is caught only in catch block of _doWork()_ method
> in {_}ChangeNotificationProcessorThread{_}, and it leads to
> {_}FatalExitError{_}.
> This problem with shutdown is reproduced in the new versions of Kafka (which
> already have fix regarding deadlock from 13544).
> It is hard to write a synthetic test to reproduce problem, but it can be
> reproduced locally via debug mode with the following steps:
> 1) Start Zookeeper and start Kafka in debug mode.
> 2) Emulate connectivity problem between Kafka and Zookeeper, for example
> connection can be closed via Netcrusher library.
> 3) Put a breakpoint in _updateLatestOrThrow()_ method in
> _FeatureCacheUpdater_ class, before
> _zkClient.getDataAndVersion(featureZkNodePath)_ line execution.
> 4) Restore connection between Kafka and Zookeeper after session expiration.
> Kafka execution should be stopped on the breakpoint.
> 5) Resume execution until Kafka starts to execute line
> _zooKeeperClient.handleRequests(remainingRequests)_ in
> _retryRequestsUntilConnected_ method in _KafkaZkClient_ class.
> 6) Again emulate connectivity problem between Kafka and Zookeeper and wait
> until session will be expired.
> 7) Restore connection between Kafka and Zookeeper.
> 8) Kafka begins shutdown process, due to:
> _ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK
> node change event. The broker will eventually exit.
> (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)_
>
> The following problems on the real environment can be caused by some network
> problems and periodic disconnection and connection to the Zookeeper in a
> short time period.
> I started mail thread in
> [https://lists.apache.org/thread/gbk4scwd8g7mg2tfsokzj5tjgrjrb9dw] regarding
> this problem, but have no answers.
> For me it seems like defect, because Kafka initiates shutdown after restoring
> connection between Kafka and Zookeeper, and should be fixed.
> Thank you.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)