[jira] [Updated] (HIVE-14631) HiveServer2 regularly fails to connect to metastore
[ https://issues.apache.org/jira/browse/HIVE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandre Linte updated HIVE-14631: --- Description: I have a cluster secured with Kerberos and Hive is configured to work with Tez by default. Everything works well through hive-cli and beeline; however, I'm facing a strange behavior through Hue. I can have a lot of client connections (these can reach 600) and after a day, the client connections fail. But this is not the case for all clients connection attempts. When it fails, I have the following logs on the HiveServer2: {noformat} Aug 3 09:28:04 hiveserver2.bigdata.fr Executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in parallel Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:05 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:05 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:06 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:06 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:06 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:08 hiveserver2.bigdata.fr FAILED: Execution Error, return code -1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Aug 3 09:28:08 hiveserver2.bigdata.fr Completed executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112); Time taken: 4.002 seconds {noformat} At the same time I have the following logs on the Metastore are: {noformat} Aug 3 09:28:03 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:03 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:04 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:05 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:05 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:06 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:06 metastore01.bigdata.fr Error occurred during processing of message. {noformat} Note: I also created a JIRA for Hue: https://issues.cloudera.org/browse/HUE-4748 was: I have a cluster secured with Kerberos and Hive is configured to work with Tez by default. Everything works well through hive-cli and beeline; however, I'm facing a strange behavior through Hue. I can have a lot of client connections (these can reach 600) and after a day, the client connections fail. But this is not the case for all clients connection attempts. When it fails, I have the following logs on the HiveServer2: {noformat} Aug 3 09:28:04 hiveserver2.bigdata.fr Executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in parallel Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:05
[jira] [Updated] (HIVE-14631) HiveServer2 regularly fails to connect to metastore
[ https://issues.apache.org/jira/browse/HIVE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandre Linte updated HIVE-14631: --- Description: I have a cluster secured with Kerberos and Hive is configured to work with Tez by default. Everything works well through hive-cli and beeline; however, I'm facing a strange behavior through Hue. I can have a lot of client connections (these can reach 600) and after a day, the client connections fail. But this is not the case for all clients connection attempts. When it fails, I have the following logs on the HiveServer2: {noformat} Aug 3 09:28:04 hiveserver2.bigdata.fr Executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in parallel Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:05 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:05 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:06 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:06 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:06 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:08 hiveserver2.bigdata.fr FAILED: Execution Error, return code -1 from org.apache.hadoop.hive.ql.exec.tez.TezTask Aug 3 09:28:08 hiveserver2.bigdata.fr Completed executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112); Time taken: 4.002 seconds {noformat} At the same time I have the following logs on the Metastore are: {noformat} Aug 3 09:28:03 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:03 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr 180: get_table : db=shfs3453 tbl=camille_test Aug 3 09:28:04 metastore01.bigdata.fr ugi=shfs3453#011ip=10.77.64.228#011cmd=get_table : db=shfs3453 tbl=camille_test#011 Aug 3 09:28:04 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:04 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:05 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:05 metastore01.bigdata.fr Error occurred during processing of message. Aug 3 09:28:06 metastore01.bigdata.fr SASL negotiation failure Aug 3 09:28:06 metastore01.bigdata.fr Error occurred during processing of message. {noformat} To solve the connections issue, I have to restart the HiveServer2. Note: I also created a JIRA for Hue: https://issues.cloudera.org/browse/HUE-4748 was: I have a cluster secured with Kerberos and Hive is configured to work with Tez by default. Everything works well through hive-cli and beeline; however, I'm facing a strange behavior through Hue. I can have a lot of client connections (these can reach 600) and after a day, the client connections fail. But this is not the case for all clients connection attempts. When it fails, I have the following logs on the HiveServer2: {noformat} Aug 3 09:28:04 hiveserver2.bigdata.fr Executing command(queryId=hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112): INSERT INTO TABLE shfs3453.camille_test VALUES ('coucou') Aug 3 09:28:04 hiveserver2.bigdata.fr Query ID = hiveserver2_20160803092803_a216edf1-bb51-43a7-81a6-f40f1574b112 Aug 3 09:28:04 hiveserver2.bigdata.fr Total jobs = 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Launching Job 1 out of 1 Aug 3 09:28:04 hiveserver2.bigdata.fr Starting task [Stage-1:MAPRED] in parallel Aug 3 09:28:04 hiveserver2.bigdata.fr Trying to connect to metastore with URI thrift://metastore01.bigdata.fr:9083 Aug 3 09:28:04 hiveserver2.bigdata.fr Failed to connect to the MetaStore Server... Aug 3 09:28:04 hiveserver2.bigdata.fr Waiting 1 seconds before next connection attempt. Aug 3 09:28:05 hiveserver2.bigdata.fr Trying to connect to metastore with URI