Error communicating with the metastore | org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset

2020-02-12 Thread Bernard Quizon
Hi, everyone.

We're getting this error on Hive 3.1.0:

org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with
the metastore
at
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getValidWriteIds(DbTxnManager.java:714)
at org.apache.hadoop.hive.ql.Driver.recordValidWriteIds(Driver.java:1463)
at org.apache.hadoop.hive.ql.Driver.acquireLocks(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:1832)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2002)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1746)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1740)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226)
at
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
at
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:318)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:331)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException:
java.net.SocketException: Connection reset
at
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
at
org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:202)
at
org.apache.hadoop.hive.metastore.api.GetValidWriteIdsRequest$GetValidWriteIdsRequestStandardScheme.write(GetValidWriteIdsRequest.java:489)
at
org.apache.hadoop.hive.metastore.api.GetValidWriteIdsRequest$GetValidWriteIdsRequestStandardScheme.write(GetValidWriteIdsRequest.java:424)
at
org.apache.hadoop.hive.metastore.api.GetValidWriteIdsRequest.write(GetValidWriteIdsRequest.java:362)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_valid_write_ids_args$get_valid_write_ids_argsStandardScheme.write(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_valid_write_ids_args$get_valid_write_ids_argsStandardScheme.write(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_valid_write_ids_args.write(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:71)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_valid_write_ids(ThriftHiveMetastore.java:5443)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_valid_write_ids(ThriftHiveMetastore.java:5435)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getValidWriteIds(HiveMetaStoreClient.java:2589)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
at com.sun.proxy.$Proxy37.getValidWriteIds(Unknown Source)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2934)
at com.sun.proxy.$Proxy37.getValidWriteIds(Unknown Source)
at
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.getValidWriteIds(DbTxnManager.java:712)
... 21 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
... 44 more

It looks like while the thrift client is writing the request in the
OutputStream, the server resets the connection.
Is there any workaround to this? Or has anyone else experienced this?
This only started showing up recently. Thank you in advance.

Regards,
Bernard


Re: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.

2020-02-11 Thread Bernard Quizon
Hi.

We fixed the issue by patching protobuf-java-2.5.0.jar, we
changed CodedInputStream.DEFAULT_SIZE_LIMIT to 1GB.
Uploaded the patched version on our servers and added the location of the
aforementioned jar to the *tez.cluster.additional.classpath.prefix*
(tez-site.xml)
to /path/to/patched/protobuf-java-2.5.0.jar:.
Please note that it should be the first jar on the
*tez.cluster.additional.classpath.prefix*.
Apparently, Tez was using the default 64MB protobuf message limit.

BTW, latest protobuf version was set to Integer.MAX_VALUE.
See
https://github.com/protocolbuffers/protobuf/blob/v3.11.3/java/core/src/main/java/com/google/protobuf/CodedInputStream.java#L62-L65
.

Regards,
Bernard

On Mon, Feb 10, 2020 at 8:23 PM Bernard Quizon <
bernard.qui...@cheetahdigital.com> wrote:

> Hi.
>
> We're using Hive 3.0.1 and we're currently experiencing this issue:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Error while processing statement: FAILED: Execution Error, return code 2
> from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
> vertexName=Map 1, vertexId=vertex_1581309524541_0094_14_00,
> diagnostics=[Vertex vertex_1581309524541_0094_14_00 [Map 1] killed/failed
> due to:INIT_FAILURE, Fail to create InputInitializerManager,
> org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class
> with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGeneratorat
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)at
> org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)at
> java.security.AccessController.doPrivileged(Native Method)at
> javax.security.auth.Subject.doAs(Subject.java:422)at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)at
> org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)at
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)at
> org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)at
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)at
> org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)at
> org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)at
> org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)at
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)at
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)at
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)at
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)at
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)at
> java.lang.Thread.run(Thread.java:745)Caused by:
> java.lang.reflect.InvocationTargetExceptionat
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)at
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)...
> 25 moreCaused by: com.google.protobuf.InvalidProtocolBufferException:
> Protocol message was too large.  May be malicious.  Use
> CodedInputStream.setSizeLimit() to increase the size limit.at
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)at
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)at
> com.google.pr