Hari Sekhon created HIVE-10570:
----------------------------------

             Summary: HiveServer2 shut downs due to temporary ZooKeeper 
unavailability, causes permanent outage instead of temporary
                 Key: HIVE-10570
                 URL: https://issues.apache.org/jira/browse/HIVE-10570
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 0.14.0
         Environment: HDP 2.2
            Reporter: Hari Sekhon
            Priority: Critical


HiveServer2 should not shut down when there is temporary ZooKeeper 
unavailability (eg. temporary network outage). This prevents retry and recovery 
later as HiveServer2 is no longer running and therefore cannot retry - 
HiveServer2 stays offline indefinitely until operator intervention to restart 
it, even for minor temporary problems.

I believe this behaviour is due to recent ZooKeeper dependency addition for 
HiveServer2 HA.
{code}2015-05-01 11:35:05,367 WARN  zookeeper.ClientCnxn 
(ClientCnxn.java:run(1102)) - Session 0x14d004cb02c001e for server null, 
unexpected error, closing socket
connection and attempting reconnect
java.net.SocketException: Network is unreachable
        at sun.nio.ch.Net.connect0(Native Method)
        at sun.nio.ch.Net.connect(Net.java:465)
        at sun.nio.ch.Net.connect(Net.java:457)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:277)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.connect(ClientCnxnSocketNIO.java:287)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:967)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1003)
2015-05-01 11:35:05,629 INFO  client.ZooKeeperSaslClient 
(ZooKeeperSaslClient.java:run(285)) - Client will use GSSAPI as SASL mechanism.
2015-05-01 11:35:05,630 INFO  zookeeper.ClientCnxn 
(ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server 
<custom_scrubbed>/<ip>:2181. Will attempt to SASL-authenticate using Login 
Context section 'HiveZooKeeperClient'
2015-05-01 11:35:05,630 ERROR zookeeper.ClientCnxnSocketNIO 
(ClientCnxnSocketNIO.java:connect(289)) - Unable to open socket to 
<custom_scrubbed>/<ip>:2181
2015-05-01 11:35:05,630 ERROR zookeeper.ClientCnxnSocketNIO 
(ClientCnxnSocketNIO.java:connect(289)) - Unable to open socket to 
<custom_scrubbed>/<ip>:2181
2015-05-01 11:35:05,630 WARN  zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) 
- Session 0x14d004cb02c001e for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.SocketException: Network is unreachable
        at sun.nio.ch.Net.connect0(Native Method)
        at sun.nio.ch.Net.connect(Net.java:465)
        at sun.nio.ch.Net.connect(Net.java:457)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:277)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.connect(ClientCnxnSocketNIO.java:287)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:967)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1003)
2015-05-01 11:35:05,943 INFO  server.HiveServer2 (HiveServer2.java:stop(299)) - 
Shutting down HiveServer2
2015-05-01 11:35:05,944 INFO  thrift.ThriftCLIService 
(ThriftCLIService.java:stop(137)) - Thrift server has stopped
2015-05-01 11:35:05,944 INFO  service.AbstractService 
(AbstractService.java:stop(125)) - Service:ThriftBinaryCLIService is stopped.
2015-05-01 11:35:05,944 INFO  service.AbstractService 
(AbstractService.java:stop(125)) - Service:OperationManager is stopped.
2015-05-01 11:35:05,944 INFO  service.AbstractService 
(AbstractService.java:stop(125)) - Service:SessionManager is stopped.
2015-05-01 11:35:05,946 INFO  server.HiveServer2 
(HiveStringUtils.java:run(679)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down HiveServer2 at <fqdn>/<ip>
************************************************************/{code}
Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to