[ https://issues.apache.org/jira/browse/HIVE-25522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415278#comment-17415278 ]
Szehon Ho edited comment on HIVE-25522 at 9/15/21, 1:18 AM: ------------------------------------------------------------ [~csun] helped me to reason through the code and pointed out the potential error, thanks! It looks like the TxnHandler's initialization block sets a lot of static variables llke connPool, sqlGenerator, etc.. But it skips it if only connPool is set. So it's possible that connPool was successfully set but the thread crashed and did not set the other variables. Ref: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L372 It looks like TxnHandler's are thread local and are constructed on demand, so a fatal error in one thread does not kill the HMS server. was (Author: szehon): [~csun] helped me to reason through the code, thanks! It looks like the TxnHandler's initialization block sets a lot of static variables llke connPool, sqlGenerator, etc.. But it skips it if only connPool is set. So it's possible that connPool was successfully set but the thread crashed and did not set the other variables. Ref: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L372 It looks like TxnHandler's are thread local and are constructed on demand, so a fatal error in one thread does not kill the HMS server. > NullPointerException in TxnHandler > ---------------------------------- > > Key: HIVE-25522 > URL: https://issues.apache.org/jira/browse/HIVE-25522 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore > Affects Versions: 3.1.2 > Reporter: Szehon Ho > Priority: Major > > Environment: Using Iceberg on Hive 3.1.2 standalone metastore. Iceberg > issues a lot of lock() calls for commits. > We hit randomly a strange NPE that fails Iceberg commits. > {noformat} > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] > metastore.RetryingHMSHandler: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy27.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: > Error occurred during processing of message. > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217) > ~[hive-exec-3.1.2.jar:3.1.2] > at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown > Source) ~[?:?] > at > jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:?] > at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > ~[hive-exec-3.1.2.jar:3.1.2] > at com.sun.proxy.$Proxy27.lock(Unknown Source) ~[?:?] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095) > ~[hive-exec-3.1.2.jar:3.1.2] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) > ~[hive-exec-3.1.2.jar:3.1.2] > at java.security.AccessController.doPrivileged(Native Method) ~[?:?] > at javax.security.auth.Subject.doAs(Subject.java:423) ~[?:?] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > ~[hadoop-common-3.1.4.jar:?] > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) > ~[hive-exec-3.1.2.jar:3.1.2] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > [hive-exec-3.1.2.jar:3.1.2] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > {noformat} > It seems it's this line, though root cause is not deterined. > https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L1903 -- This message was sent by Atlassian Jira (v8.3.4#803005)