[
https://issues.apache.org/jira/browse/ZOOKEEPER-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ConfX updated ZOOKEEPER-5012:
-----------------------------
Description:
The `QuorumPeer.shutdown()` method throws a `NullPointerException` when called
on a QuorumPeer instance that hasn't fully initialized its `zkDb` field. This
can happen during restart scenarios when a server is being shutdown before it
finishes initializing.
{*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
{code:java}
public void shutdown() {
running = false;
x509Util.close();
if (leader != null) { // Line 1625 - NULL CHECK EXISTS
leader.shutdown("quorum Peer shutdown");
}
if (follower != null) { // Line 1628 - NULL CHECK EXISTS
follower.shutdown();
}
shutdownServerCnxnFactory();
if (udpSocket != null) { // Line 1632 - NULL CHECK EXISTS
udpSocket.close();
}
if (jvmPauseMonitor != null) { // Line 1635 - NULL CHECK EXISTS
jvmPauseMonitor.serviceStop();
} try {
adminServer.shutdown();
} catch (AdminServerException e) {
LOG.warn("Problem stopping AdminServer", e);
} if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
this.interrupt();
getElectionAlg().shutdown();
}
try {
zkDb.close(); // Line 1650 - NO NULL CHECK! BUG HERE
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
} {code}
h2. *Why zkDb Can Be Null*
1. Default Constructor Does NOT Initialize zkDb:
{code:java}
// QuorumPeer.java line 1069
public QuorumPeer() throws SaslException {
super("QuorumPeer");
quorumStats = new QuorumStats(this);
jmxRemotePeerBean = new HashMap<>();
adminServer = AdminServerFactory.createAdminServer();
x509Util = createX509Util();
initialize();
reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
// NOTE: zkDb is NOT initialized here!
} {code}
2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
{code:java}
// QuorumPeerMain.java lines 177-193
quorumPeer = getQuorumPeer(); // Creates QuorumPeer with default constructor
quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(),
config.getDataDir()));
// ... many other setter calls ...
quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); //
Line 193 - zkDb is set here {code}
h2. The Race Condition Scenario
1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the QuorumPeer
(*{*}zkDb is null at this point{*}*)
4. *{*}BEFORE{*}* line 193 `setZKDatabase()` is reached, if the node restart
triggers a shutdown...
5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which calls
`quorumPeer.shutdown()`
6. Line 1650 `zkDb.close()` throws NullPointerException
h2. Stacktrace
{code:java}
Caused by: java.lang.NullPointerException
at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
at
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
at
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
at
org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
... {code}
h2. Proposed Fix
Add a null check for `zkDb` before calling `close()`, similar to how other
fields are handled in the same method:
{code:java}
// Current buggy code (line 1649-1653):
try {
zkDb.close();
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
// Fixed code:
if (zkDb != null) {
try {
zkDb.close();
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
} {code}
I'm happy to send a PR for this issue.
was:
The `QuorumPeer.shutdown()` method throws a `NullPointerException` when called
on a QuorumPeer instance that hasn't fully initialized its `zkDb` field. This
can happen during restart scenarios when a server is being shutdown before it
finishes initializing.
{*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
{code:java}
public void shutdown() {
running = false;
x509Util.close();
if (leader != null) { // Line 1625 - NULL CHECK EXISTS
leader.shutdown("quorum Peer shutdown");
}
if (follower != null) { // Line 1628 - NULL CHECK EXISTS
follower.shutdown();
}
shutdownServerCnxnFactory();
if (udpSocket != null) { // Line 1632 - NULL CHECK EXISTS
udpSocket.close();
}
if (jvmPauseMonitor != null) { // Line 1635 - NULL CHECK EXISTS
jvmPauseMonitor.serviceStop();
} try {
adminServer.shutdown();
} catch (AdminServerException e) {
LOG.warn("Problem stopping AdminServer", e);
} if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
this.interrupt();
getElectionAlg().shutdown();
}
try {
zkDb.close(); // Line 1650 - NO NULL CHECK! BUG HERE
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
} {code}
h2. *Why zkDb Can Be Null*
1. Default Constructor Does NOT Initialize zkDb:
{code:java}
// QuorumPeer.java line 1069
public QuorumPeer() throws SaslException {
super("QuorumPeer");
quorumStats = new QuorumStats(this);
jmxRemotePeerBean = new HashMap<>();
adminServer = AdminServerFactory.createAdminServer();
x509Util = createX509Util();
initialize();
reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
// NOTE: zkDb is NOT initialized here!
} {code}
2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
{code:java}
// QuorumPeerMain.java lines 177-193
quorumPeer = getQuorumPeer(); // Creates QuorumPeer with default constructor
quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(),
config.getDataDir()));
// ... many other setter calls ...
quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); //
Line 193 - zkDb is set here {code}
h2. The Race Condition Scenario
1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the QuorumPeer
(**zkDb is null at this point**)
4. **BEFORE** line 193 `setZKDatabase()` is reached, if the restart framework
triggers a shutdown...
5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which calls
`quorumPeer.shutdown()`
6. Line 1650 `zkDb.close()` throws NullPointerException
h2. Stacktrace
{code:java}
Caused by: java.lang.NullPointerException
at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
at
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
at
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
at
org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
... {code}
h2. Proposed Fix
Add a null check for `zkDb` before calling `close()`, similar to how other
fields are handled in the same method:
{code:java}
// Current buggy code (line 1649-1653):
try {
zkDb.close();
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
// Fixed code:
if (zkDb != null) {
try {
zkDb.close();
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
} {code}
I'm happy to send a PR for this issue.
> NPE in QuorumPeer shutdown under restart
> ----------------------------------------
>
> Key: ZOOKEEPER-5012
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5012
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum, server
> Affects Versions: 3.9.4
> Reporter: ConfX
> Priority: Critical
>
> The `QuorumPeer.shutdown()` method throws a `NullPointerException` when
> called on a QuorumPeer instance that hasn't fully initialized its `zkDb`
> field. This can happen during restart scenarios when a server is being
> shutdown before it finishes initializing.
>
> {*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
> {code:java}
> public void shutdown() {
> running = false;
> x509Util.close();
> if (leader != null) { // Line 1625 - NULL CHECK EXISTS
> leader.shutdown("quorum Peer shutdown");
> }
> if (follower != null) { // Line 1628 - NULL CHECK EXISTS
> follower.shutdown();
> }
> shutdownServerCnxnFactory();
> if (udpSocket != null) { // Line 1632 - NULL CHECK EXISTS
> udpSocket.close();
> }
> if (jvmPauseMonitor != null) { // Line 1635 - NULL CHECK EXISTS
> jvmPauseMonitor.serviceStop();
> } try {
> adminServer.shutdown();
> } catch (AdminServerException e) {
> LOG.warn("Problem stopping AdminServer", e);
> } if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
> this.interrupt();
> getElectionAlg().shutdown();
> }
> try {
> zkDb.close(); // Line 1650 - NO NULL CHECK! BUG HERE
> } catch (IOException ie) {
> LOG.warn("Error closing logs ", ie);
> }
> } {code}
> h2. *Why zkDb Can Be Null*
> 1. Default Constructor Does NOT Initialize zkDb:
> {code:java}
> // QuorumPeer.java line 1069
> public QuorumPeer() throws SaslException {
> super("QuorumPeer");
> quorumStats = new QuorumStats(this);
> jmxRemotePeerBean = new HashMap<>();
> adminServer = AdminServerFactory.createAdminServer();
> x509Util = createX509Util();
> initialize();
> reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
> // NOTE: zkDb is NOT initialized here!
> } {code}
> 2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
> {code:java}
> // QuorumPeerMain.java lines 177-193
> quorumPeer = getQuorumPeer(); // Creates QuorumPeer with default
> constructor
> quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(),
> config.getDataDir()));
> // ... many other setter calls ...
> quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); //
> Line 193 - zkDb is set here {code}
> h2. The Race Condition Scenario
> 1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
> 2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
> 3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the
> QuorumPeer (*{*}zkDb is null at this point{*}*)
> 4. *{*}BEFORE{*}* line 193 `setZKDatabase()` is reached, if the node restart
> triggers a shutdown...
> 5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which
> calls `quorumPeer.shutdown()`
> 6. Line 1650 `zkDb.close()` throws NullPointerException
> h2. Stacktrace
> {code:java}
> Caused by: java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
> at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
> at
> org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
> ... {code}
> h2. Proposed Fix
> Add a null check for `zkDb` before calling `close()`, similar to how other
> fields are handled in the same method:
> {code:java}
> // Current buggy code (line 1649-1653):
> try {
> zkDb.close();
> } catch (IOException ie) {
> LOG.warn("Error closing logs ", ie);
> }
> // Fixed code:
> if (zkDb != null) {
> try {
> zkDb.close();
> } catch (IOException ie) {
> LOG.warn("Error closing logs ", ie);
> }
> } {code}
> I'm happy to send a PR for this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)