[jira] [Updated] (HDDS-3902) OM HA client failover switcher to a wrong OM server

2020-06-30 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3902:
--
Labels: 0.7.0  (was: )

> OM HA client failover switcher to a wrong OM server
> ---
>
> Key: HDDS-3902
> URL: https://issues.apache.org/jira/browse/HDDS-3902
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: OM HA
>Reporter: Marton Elek
>Priority: Major
>  Labels: 0.7.0
>
> Found this problem with the PR/branch HDDS-3878, but it seems to be 
> independent.
> 1. ozone sh volume create /vol1 works well with HA
> 2. ozone freon omkg (rpc client) doesn't work
> {code}
> ozone freon omkg | grep "Failing over"
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 1, nodeId: om2
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 0, nodeId: omNodeIdDummy
> {code}
> om2 seems to be the leader but for some reason the failover logic switching 
> back to an unknown node (?)
> {code}
> 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 6 ms.
> 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
> 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: 
> starting, having connections 3
> 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #0 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #0
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 
> 439ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #1 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #1
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #2 org.apache.hadoop.ozone.om.pro
> tocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #2
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #3 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #3
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #5 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #11 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending 

[jira] [Updated] (HDDS-3902) OM HA client failover switcher to a wrong OM server

2020-06-30 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3902:
--
Issue Type: Bug  (was: Improvement)

> OM HA client failover switcher to a wrong OM server
> ---
>
> Key: HDDS-3902
> URL: https://issues.apache.org/jira/browse/HDDS-3902
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: OM HA
>Reporter: Marton Elek
>Priority: Blocker
>  Labels: 0.7.0
>
> Found this problem with the PR/branch HDDS-3878, but it seems to be 
> independent.
> 1. ozone sh volume create /vol1 works well with HA
> 2. ozone freon omkg (rpc client) doesn't work
> {code}
> ozone freon omkg | grep "Failing over"
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 1, nodeId: om2
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 0, nodeId: omNodeIdDummy
> {code}
> om2 seems to be the leader but for some reason the failover logic switching 
> back to an unknown node (?)
> {code}
> 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 6 ms.
> 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
> 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: 
> starting, having connections 3
> 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #0 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #0
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 
> 439ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #1 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #1
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #2 org.apache.hadoop.ozone.om.pro
> tocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #2
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #3 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #3
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #5 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #11 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> 

[jira] [Updated] (HDDS-3902) OM HA client failover switcher to a wrong OM server

2020-06-30 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3902:
--
Priority: Blocker  (was: Major)

> OM HA client failover switcher to a wrong OM server
> ---
>
> Key: HDDS-3902
> URL: https://issues.apache.org/jira/browse/HDDS-3902
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: OM HA
>Reporter: Marton Elek
>Priority: Blocker
>  Labels: 0.7.0
>
> Found this problem with the PR/branch HDDS-3878, but it seems to be 
> independent.
> 1. ozone sh volume create /vol1 works well with HA
> 2. ozone freon omkg (rpc client) doesn't work
> {code}
> ozone freon omkg | grep "Failing over"
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 1, nodeId: om2
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 0, nodeId: omNodeIdDummy
> {code}
> om2 seems to be the leader but for some reason the failover logic switching 
> back to an unknown node (?)
> {code}
> 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 6 ms.
> 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
> 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: 
> starting, having connections 3
> 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #0 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #0
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 
> 439ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #1 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #1
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #2 org.apache.hadoop.ozone.om.pro
> tocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #2
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #3 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #3
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #5 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #11 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 

[jira] [Updated] (HDDS-3902) OM HA client failover switcher to a wrong OM server

2020-06-30 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3902:
--
Target Version/s:   (was: 0.6.0)

> OM HA client failover switcher to a wrong OM server
> ---
>
> Key: HDDS-3902
> URL: https://issues.apache.org/jira/browse/HDDS-3902
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: OM HA
>Reporter: Marton Elek
>Priority: Major
>
> Found this problem with the PR/branch HDDS-3878, but it seems to be 
> independent.
> 1. ozone sh volume create /vol1 works well with HA
> 2. ozone freon omkg (rpc client) doesn't work
> {code}
> ozone freon omkg | grep "Failing over"
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 1, nodeId: om2
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 0, nodeId: omNodeIdDummy
> {code}
> om2 seems to be the leader but for some reason the failover logic switching 
> back to an unknown node (?)
> {code}
> 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 6 ms.
> 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
> 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: 
> starting, having connections 3
> 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #0 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #0
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 
> 439ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #1 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #1
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #2 org.apache.hadoop.ozone.om.pro
> tocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #2
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #3 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #3
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #5 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #11 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #12 
> 

[jira] [Updated] (HDDS-3902) OM HA client failover switcher to a wrong OM server

2020-06-30 Thread Marton Elek (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Elek updated HDDS-3902:
--
Priority: Major  (was: Blocker)

> OM HA client failover switcher to a wrong OM server
> ---
>
> Key: HDDS-3902
> URL: https://issues.apache.org/jira/browse/HDDS-3902
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: OM HA
>Reporter: Marton Elek
>Priority: Major
>
> Found this problem with the PR/branch HDDS-3878, but it seems to be 
> independent.
> 1. ozone sh volume create /vol1 works well with HA
> 2. ozone freon omkg (rpc client) doesn't work
> {code}
> ozone freon omkg | grep "Failing over"
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 1, nodeId: om2
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 0, nodeId: omNodeIdDummy
> {code}
> om2 seems to be the leader but for some reason the failover logic switching 
> back to an unknown node (?)
> {code}
> 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy 
> to index: 2, nodeId: om3
> 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 6 ms.
> 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
> 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: 
> starting, having connections 3
> 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #0 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #0
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 
> 439ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #1 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #1
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #2 org.apache.hadoop.ozone.om.pro
> tocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #2
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root 
> sending #3 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to 
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got 
> value #3
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #5 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #11 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #8 
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to 
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root 
> sending #12 
>