[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ConfX updated ZOOKEEPER-5012:
-----------------------------
    Description: 
The `QuorumPeer.shutdown()` method throws a `NullPointerException` when called 
on a QuorumPeer instance that hasn't fully initialized its `zkDb` field. This 
can happen during restart scenarios when a server is being shutdown before it 
finishes initializing.
 
{*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
{code:java}
public void shutdown() {
    running = false;
    x509Util.close();
    if (leader != null) {           // Line 1625 - NULL CHECK EXISTS
        leader.shutdown("quorum Peer shutdown");
    }
    if (follower != null) {         // Line 1628 - NULL CHECK EXISTS
        follower.shutdown();
    }
    shutdownServerCnxnFactory();
    if (udpSocket != null) {        // Line 1632 - NULL CHECK EXISTS
        udpSocket.close();
    }
    if (jvmPauseMonitor != null) {  // Line 1635 - NULL CHECK EXISTS
        jvmPauseMonitor.serviceStop();
    }    try {
        adminServer.shutdown();
    } catch (AdminServerException e) {
        LOG.warn("Problem stopping AdminServer", e);
    }    if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
        this.interrupt();
        getElectionAlg().shutdown();
    }
    try {
        zkDb.close();               // Line 1650 - NO NULL CHECK! BUG HERE
    } catch (IOException ie) {
        LOG.warn("Error closing logs ", ie);
    }
} {code}
h2. *Why zkDb Can Be Null*

1. Default Constructor Does NOT Initialize zkDb:
{code:java}
   // QuorumPeer.java line 1069
   public QuorumPeer() throws SaslException {
       super("QuorumPeer");
       quorumStats = new QuorumStats(this);
       jmxRemotePeerBean = new HashMap<>();
       adminServer = AdminServerFactory.createAdminServer();
       x509Util = createX509Util();
       initialize();
       reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
       // NOTE: zkDb is NOT initialized here!
   } {code}
2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
{code:java}
   // QuorumPeerMain.java lines 177-193
   quorumPeer = getQuorumPeer();  // Creates QuorumPeer with default constructor
   quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(), 
config.getDataDir()));
   // ... many other setter calls ...
   quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); // 
Line 193 - zkDb is set here {code}
h2. The Race Condition Scenario

1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the QuorumPeer 
(zkDb is null at this point)
4. BEFORE line 193 `setZKDatabase()` is reached, if the node restart triggers a 
shutdown...
5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which calls 
`quorumPeer.shutdown()`
6. Line 1650 `zkDb.close()` throws NullPointerException
h2. Stacktrace
{code:java}
Caused by: java.lang.NullPointerException
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
    at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
    at 
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
    at 
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
    at 
org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
    ... {code}
h2. Proposed Fix

Add a null check for `zkDb` before calling `close()`, similar to how other 
fields are handled in the same method:
{code:java}
// Current buggy code (line 1649-1653):
try {
    zkDb.close();
} catch (IOException ie) {
    LOG.warn("Error closing logs ", ie);
}

// Fixed code:
if (zkDb != null) {
    try {
        zkDb.close();
    } catch (IOException ie) {
        LOG.warn("Error closing logs ", ie);
    }
} {code}
I'm happy to send a PR for this issue.

  was:
The `QuorumPeer.shutdown()` method throws a `NullPointerException` when called 
on a QuorumPeer instance that hasn't fully initialized its `zkDb` field. This 
can happen during restart scenarios when a server is being shutdown before it 
finishes initializing.
 
{*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
{code:java}
public void shutdown() {
    running = false;
    x509Util.close();
    if (leader != null) {           // Line 1625 - NULL CHECK EXISTS
        leader.shutdown("quorum Peer shutdown");
    }
    if (follower != null) {         // Line 1628 - NULL CHECK EXISTS
        follower.shutdown();
    }
    shutdownServerCnxnFactory();
    if (udpSocket != null) {        // Line 1632 - NULL CHECK EXISTS
        udpSocket.close();
    }
    if (jvmPauseMonitor != null) {  // Line 1635 - NULL CHECK EXISTS
        jvmPauseMonitor.serviceStop();
    }    try {
        adminServer.shutdown();
    } catch (AdminServerException e) {
        LOG.warn("Problem stopping AdminServer", e);
    }    if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
        this.interrupt();
        getElectionAlg().shutdown();
    }
    try {
        zkDb.close();               // Line 1650 - NO NULL CHECK! BUG HERE
    } catch (IOException ie) {
        LOG.warn("Error closing logs ", ie);
    }
} {code}
h2. *Why zkDb Can Be Null*

1. Default Constructor Does NOT Initialize zkDb:
{code:java}
   // QuorumPeer.java line 1069
   public QuorumPeer() throws SaslException {
       super("QuorumPeer");
       quorumStats = new QuorumStats(this);
       jmxRemotePeerBean = new HashMap<>();
       adminServer = AdminServerFactory.createAdminServer();
       x509Util = createX509Util();
       initialize();
       reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
       // NOTE: zkDb is NOT initialized here!
   } {code}
2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
{code:java}
   // QuorumPeerMain.java lines 177-193
   quorumPeer = getQuorumPeer();  // Creates QuorumPeer with default constructor
   quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(), 
config.getDataDir()));
   // ... many other setter calls ...
   quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); // 
Line 193 - zkDb is set here {code}
h2. The Race Condition Scenario

1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the QuorumPeer 
(*{*}zkDb is null at this point{*}*)
4. *{*}BEFORE{*}* line 193 `setZKDatabase()` is reached, if the node restart 
triggers a shutdown...
5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which calls 
`quorumPeer.shutdown()`
6. Line 1650 `zkDb.close()` throws NullPointerException
h2. Stacktrace
{code:java}
Caused by: java.lang.NullPointerException
    at 
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
    at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
    at 
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
    at 
org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
    at 
org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
    ... {code}
h2. Proposed Fix

Add a null check for `zkDb` before calling `close()`, similar to how other 
fields are handled in the same method:
{code:java}
// Current buggy code (line 1649-1653):
try {
    zkDb.close();
} catch (IOException ie) {
    LOG.warn("Error closing logs ", ie);
}

// Fixed code:
if (zkDb != null) {
    try {
        zkDb.close();
    } catch (IOException ie) {
        LOG.warn("Error closing logs ", ie);
    }
} {code}
I'm happy to send a PR for this issue.


> NPE in QuorumPeer shutdown under restart
> ----------------------------------------
>
>                 Key: ZOOKEEPER-5012
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-5012
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.9.4
>            Reporter: ConfX
>            Priority: Critical
>
> The `QuorumPeer.shutdown()` method throws a `NullPointerException` when 
> called on a QuorumPeer instance that hasn't fully initialized its `zkDb` 
> field. This can happen during restart scenarios when a server is being 
> shutdown before it finishes initializing.
>  
> {*}File:{*}`zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java`
> {code:java}
> public void shutdown() {
>     running = false;
>     x509Util.close();
>     if (leader != null) {           // Line 1625 - NULL CHECK EXISTS
>         leader.shutdown("quorum Peer shutdown");
>     }
>     if (follower != null) {         // Line 1628 - NULL CHECK EXISTS
>         follower.shutdown();
>     }
>     shutdownServerCnxnFactory();
>     if (udpSocket != null) {        // Line 1632 - NULL CHECK EXISTS
>         udpSocket.close();
>     }
>     if (jvmPauseMonitor != null) {  // Line 1635 - NULL CHECK EXISTS
>         jvmPauseMonitor.serviceStop();
>     }    try {
>         adminServer.shutdown();
>     } catch (AdminServerException e) {
>         LOG.warn("Problem stopping AdminServer", e);
>     }    if (getElectionAlg() != null) { // Line 1645 - NULL CHECK EXISTS
>         this.interrupt();
>         getElectionAlg().shutdown();
>     }
>     try {
>         zkDb.close();               // Line 1650 - NO NULL CHECK! BUG HERE
>     } catch (IOException ie) {
>         LOG.warn("Error closing logs ", ie);
>     }
> } {code}
> h2. *Why zkDb Can Be Null*
> 1. Default Constructor Does NOT Initialize zkDb:
> {code:java}
>    // QuorumPeer.java line 1069
>    public QuorumPeer() throws SaslException {
>        super("QuorumPeer");
>        quorumStats = new QuorumStats(this);
>        jmxRemotePeerBean = new HashMap<>();
>        adminServer = AdminServerFactory.createAdminServer();
>        x509Util = createX509Util();
>        initialize();
>        reconfigEnabled = QuorumPeerConfig.isReconfigEnabled();
>        // NOTE: zkDb is NOT initialized here!
>    } {code}
> 2. zkDb is Only Set Later in QuorumPeerMain.runFromConfig():
> {code:java}
>    // QuorumPeerMain.java lines 177-193
>    quorumPeer = getQuorumPeer();  // Creates QuorumPeer with default 
> constructor
>    quorumPeer.setTxnFactory(new FileTxnSnapLog(config.getDataLogDir(), 
> config.getDataDir()));
>    // ... many other setter calls ...
>    quorumPeer.setZKDatabase(new ZKDatabase(quorumPeer.getTxnFactory())); // 
> Line 193 - zkDb is set here {code}
> h2. The Race Condition Scenario
> 1. `MainThread.start()` creates a new `TestQPMain` and starts a new thread
> 2. The new thread calls `main.initializeAndRun(args)` -> `runFromConfig()`
> 3. In `runFromConfig()`, `quorumPeer = getQuorumPeer()` creates the 
> QuorumPeer (zkDb is null at this point)
> 4. BEFORE line 193 `setZKDatabase()` is reached, if the node restart triggers 
> a shutdown...
> 5. `TestQPMain.shutdown()` calls `QuorumBase.shutdown(quorumPeer)` which 
> calls `quorumPeer.shutdown()`
> 6. Line 1650 `zkDb.close()` throws NullPointerException
> h2. Stacktrace
> {code:java}
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:1650)
>     at org.apache.zookeeper.test.QuorumBase.shutdown(QuorumBase.java:509)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$TestQPMain.shutdown(QuorumPeerTestBase.java:83)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPeerTestBase$MainThread.shutdown(QuorumPeerTestBase.java:353)
>     at 
> org.restarttest.adapter.zookeeper.MainThreadAdapter.restartMainThread(MainThreadAdapter.java:129)
>     ... {code}
> h2. Proposed Fix
> Add a null check for `zkDb` before calling `close()`, similar to how other 
> fields are handled in the same method:
> {code:java}
> // Current buggy code (line 1649-1653):
> try {
>     zkDb.close();
> } catch (IOException ie) {
>     LOG.warn("Error closing logs ", ie);
> }
> // Fixed code:
> if (zkDb != null) {
>     try {
>         zkDb.close();
>     } catch (IOException ie) {
>         LOG.warn("Error closing logs ", ie);
>     }
> } {code}
> I'm happy to send a PR for this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to