Juha created FLINK-28947:
----------------------------

             Summary: Curator framework fails with NullPointerException
                 Key: FLINK-28947
                 URL: https://issues.apache.org/jira/browse/FLINK-28947
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.15.1
            Reporter: Juha


I'm getting the following error in JobManager and as a result JobManager exits.
{code:java}
Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,491] ERROR 
Background exception was not retry-able or retry gave up 
(org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl:733)
Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,493] ERROR 
Unhandled error in curator framework, error message: Background exception was 
not retry-able or retry gave up 
(org.apache.flink.runtime.util.ZooKeeperUtils:292)
Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]: [2022-08-12 06:37:30,494] ERROR Fatal 
error occurred while executing the TaskManager. Shutting it down... 
(org.apache.flink.runtime.taskexecutor.TaskManagerRunner:427)
Aug 12 06:37:30 server_name java[173]: java.lang.NullPointerException: null
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress(Compatibility.java:116)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString(EnsembleTracker.java:185)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData(EnsembleTracker.java:206)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300(EnsembleTracker.java:50)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult(EnsembleTracker.java:150)
 ~[flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:926)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:683)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:598)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
Aug 12 06:37:30 server_name java[173]:         at 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
 [flink-shaded-zookeeper-3.5.9.jar:3.5.9-15.0]
{code}
Steps
 * Create three servers
 * Run Flink JobManager and TaskManager on all of them (let's call these A, B 
and C). Use ZooKeeper HA Services.
 * Everything works as expected
 * Add a new server (D).
 * Shutdown server C
 * This error can be seen on both servers A and D. I didn't check B and C.

This can be reproduced (apparently) with every execution.

I'm using Flink 1.15.1. Actually I'm migrating from 1.13.X to 1.15.X. I'm not 
totally sure whether this ever happens on 1.13.X, but it seems to _always_ 
happen on 1.15.1.

I looked using debugger what's going on in the JobManager:
{code:java}
main-EventThread[1] where
  [1] 
org.apache.flink.shaded.curator5.org.apache.curator.utils.Compatibility.getHostAddress
 (Compatibility.java:116)
  [2] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.configToConnectionString
 (EnsembleTracker.java:185)
  [3] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.processConfigData
 (EnsembleTracker.java:206)
  [4] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker.access$300
 (EnsembleTracker.java:50)
  [5] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.EnsembleTracker$2.processResult
 (EnsembleTracker.java:150)
  [6] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback
 (CuratorFrameworkImpl.java:926)
  [7] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation
 (CuratorFrameworkImpl.java:683)
  [8] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation
 (WatcherRemovalFacade.java:152)
  [9] 
org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult
 (GetConfigBuilderImpl.java:222)
  [10] 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.processEvent
 (ClientCnxn.java:598)
  [11] 
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$EventThread.run
 (ClientCnxn.java:510)
main-EventThread[1] dump address
 address = {
    holder: instance of 
java.net.InetSocketAddress$InetSocketAddressHolder(id=8302)
    serialVersionUID: 5076001401234631237
    serialPersistentFields: instance of java.io.ObjectStreamField[3] (id=8303)
    UNSAFE: instance of jdk.internal.misc.Unsafe(id=8304)
    FIELDS_OFFSET: 12
    java.net.SocketAddress.serialVersionUID: 5215720748342549866
}
main-EventThread[1] dump address.holder
 address.holder = {
    hostname: "host_name_here"
    addr: null
    port: 2888
}
main-EventThread[1] print address.getAddress()
 address.getAddress() = null
{code}

(The hostname has been changed).

It can be seen that on line 116 of Compatibility.java 
(https://github.com/apache/curator/blob/d65669b64f003326c98843b32b997e3ffab1e442/curator-client/src/main/java/org/apache/curator/utils/Compatibility.java#L116)
 there's this

{code}
        return (address != null) ? address.getAddress().getHostAddress() : 
"unknown";
{code}

Here {{address.getAddress()}} returns {{null}} causing the crash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to