[
https://issues.apache.org/jira/browse/HIVE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041510#comment-14041510
]
Gilad Wolff commented on HIVE-6893:
-----------------------------------
I encountered the same issue, we get a socket read timeout and then
out-of-sequence error. In one case we got an OOM in our client and I suspect
it's the same underlying issue. Here is the metastore sequence of events. Our
client tried to drop a table starting at 14:02:25. Note that we use a 20
seconds timeout for our client:
{code}
2014-06-23 14:02:25,181 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
11: source:/10.20.93.47 drop_table :
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,181 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue
ip=/10.20.93.47 cmd=source:/10.20.93.47 drop_table :
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,182 INFO org.apache.hadoop.hive.metastore.HiveMetaStore:
11: source:/10.20.93.47 get_table :
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:25,182 INFO
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: ugi=hue
ip=/10.20.93.47 cmd=source:/10.20.93.47 get_table :
db=cloudera_manager_metastore_canary_test_db tbl=CM_TEST_TABLE
2014-06-23 14:02:46,596 INFO hive.metastore.hivemetastoressimpl: deleting
hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table
2014-06-23 14:02:46,694 INFO hive.metastore.hivemetastoressimpl: Moved to
trash:
hdfs://jenkins-debian60-17.ent.cloudera.com:8020/user/hue/.cloudera_manager_hive_metastore_canary/HIVE_1_HIVEMETASTORE_627a77825bb851bf2db30317a698dded/2014_06_23_14_02_11/cm_test_table
{code}
On our client we get a socket timeout for the drop table call at 14:02:45:
{code}
2:02:45.209 PM WARN
com.cloudera.cmon.firehose.polling.hive.HiveMetastoreCanary Metastore
HIVE-1-HIVEMETASTORE-627a77825bb851bf2db30317a698dded: Failed to drop table
com.cloudera.cmf.cdhclient.common.hive.MetaException:
java.net.SocketTimeoutException: Read timed out
{code}
we then try to drop the database immediately afterwards and the next message in
our logs is:
{code}
2:02:46.697 PM WARN com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl
Could not drop hive database: cloudera_manager_metastore_canary_test_db
com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TApplicationException:
get_database failed: out of sequence response
at
com.cloudera.cdh4client.hive.shaded.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:479)
at
com.cloudera.cmf.cdh4client.hive.MetastoreClientImpl.dropDatabase(MetastoreClientImpl.java:160)
{code}
Note that the moved-to-trash message in the hive metastore is from 14:02:46,694
and the out-of-order exception is from 2:02:46.697. I know that order-in-time
does not imply causation but is it possible that we are getting the drop-table
acknowledgment message instead of the get_database message?
> out of sequence error in HiveMetastore server
> ---------------------------------------------
>
> Key: HIVE-6893
> URL: https://issues.apache.org/jira/browse/HIVE-6893
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 0.12.0
> Reporter: Romain Rigaux
> Assignee: Naveen Gangam
> Fix For: 0.14.0
>
> Attachments: HIVE-6893.1.patch
>
>
> Calls listing databases or tables fail. It seems to be a concurrency problem.
> {code}
> 014-03-06 05:34:00,785 ERROR hive.log:
> org.apache.thrift.TApplicationException: get_databases failed: out of
> sequence response
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:472)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:459)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:648)
> at
> org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:66)
> at
> org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:278)
> at sun.reflect.GeneratedMethodAccessor323.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:582)
> at
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57)
> at com.sun.proxy.$Proxy9.getSchemas(Unknown Source)
> at
> org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:192)
> at
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:263)
> at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1433)
> at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1418)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.hive.service.cli.thrift.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:38)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)