[ 
https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508162#comment-14508162
 ] 

Richard Williams commented on HIVE-10410:
-----------------------------------------

[~vgumashta] I can confirm that this issue only seems to manifest with a remote 
metastore.

[~ekoifman] That's what I suspect as well. I was taking a look at the code that 
implements asynchronous execution of submitted statements in 
org.apache.hive.service.cli.operation.SqlOperation, and I noticed this 
suspicious-looking bit of code in the runInternal method:

{noformat}
// ThreadLocal Hive object needs to be set in background thread.
// The metastore client in Hive is associated with right user.
final Hive parentHive = getSessionHive();
// Current UGI will get used by metastore when metsatore is in embedded mode
// So this needs to get passed to the new background thread
final UserGroupInformation currentUGI = getCurrentUGI(opConfig);
// Runnable impl to call runInternal asynchronously,
// from a different thread
Runnable backgroundOperation = new Runnable() {
@Override
public void run() {
  PrivilegedExceptionAction<Object> doAsAction = new 
PrivilegedExceptionAction<Object>() {
    @Override
    public Object run() throws HiveSQLException {
      Hive.set(parentHive);
      SessionState.setCurrentSessionState(parentSessionState);
      // Set current OperationLog in this async thread for keeping on saving 
query log.
      registerCurrentOperationLog();
      try {
        runQuery(opConfig);
      } catch (HiveSQLException e) {
        setOperationException(e);
        LOG.error("Error running hive query: ", e);
      } finally {
        unregisterOperationLog();
      }
      return null;
    }
  };

{noformat}

Correct me if I'm wrong, but it seems to me that passing the parent thread's 
ThreadLocal Hive object to Hive.set in the children will effectively thwart the 
usage of ThreadLocal, resulting in the children and the parent all sharing the 
same Hive object. There are a number of paths in which calls to one of the 
Hive.get methods result in the current ThreadLocal Hive object being removed 
from the ThreadLocal map and replaced with a new Hive instance; however, I 
don't see anything that guarantees that that always happens on the first call 
to Hive.get in the child threads.

> Apparent race condition in HiveServer2 causing intermittent query failures
> --------------------------------------------------------------------------
>
>                 Key: HIVE-10410
>                 URL: https://issues.apache.org/jira/browse/HIVE-10410
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 0.13.1
>         Environment: CDH 5.3.3
> CentOS 6.4
>            Reporter: Richard Williams
>
> On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC 
> occasionally trigger odd Thrift exceptions with messages such as "Read a 
> negative frame size (-2147418110)!" or "out of sequence response" in 
> HiveServer2's connections to the metastore. For certain metastore calls (for 
> example, showDatabases), these Thrift exceptions are converted to 
> MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient 
> from retrying these calls and thus causes the failure to bubble out to the 
> JDBC client.
> Note that as far as we can tell, this issue appears to only affect queries 
> that are submitted with the runAsync flag on TExecuteStatementReq set to true 
> (which, in practice, seems to mean all JDBC queries), and it appears to only 
> manifest when HiveServer2 is using the new HTTP transport mechanism. When 
> both these conditions hold, we are able to fairly reliably reproduce the 
> issue by spawning about 100 simple, concurrent hive queries (we have been 
> using "show databases"), two or three of which typically fail. However, when 
> either of these conditions do not hold, we are no longer able to reproduce 
> the issue.
> Some example stack traces from the HiveServer2 logs:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: 
> org.apache.thrift.transport.TTransportException Read a negative frame size 
> (-2147418110)!
> org.apache.thrift.transport.TTransportException: Read a negative frame size 
> (-2147418110)!
>         at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
>         at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
>         at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at 
> org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
>         at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>         at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>         at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
>         at 
> org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         at 
> org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The above exception being converted into a MetaException and re-thrown:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.ql.exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got 
> exception: org.apache.thrift.transport.TTransportException Read a negative 
> frame size (-2147418110)!)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1141)
>         at 
> org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: MetaException(message:Got exception: 
> org.apache.thrift.transport.TTransportException Read a negative frame size 
> (-2147418110)!)
>         at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1116)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:839)
>         at 
> org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         ... 22 more
> {noformat}
> The above MetaException causing the query as a whole to fail:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR 
> org.apache.hive.service.cli.operation.Operation: Error running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> org.apache.thrift.transport.TTransportException Read a negative frame size 
> (-2147418110)!)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:148)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> An "out of sequence response" that occurred shortly after the above exception 
> and may have been triggered by it:
> {noformat}
> 2015-04-16 13:54:55,498 ERROR hive.log: Got exception: 
> org.apache.thrift.TApplicationException get_databases failed: out of sequence 
> response
> org.apache.thrift.TApplicationException: get_databases failed: out of 
> sequence response
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
>         at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
>         at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
>         at 
> org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
>         at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>         at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
>         at 
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
>         at 
> org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to