Query Failures

2020-02-10 Thread Charles Givre
Hello Everyone!
I recently joined a project that has a Hive/Impala installation and we are
experience a significant number of query failures.  We are using an older
version of Hive, and unfortunately there's nothing iI can do about that,
but I'm wondering is how I can make Hive do better with queries to give our
users a better experience.

For example, I can execute a basic SELECT * query or SELECT  query
without issues.

However, if I attempt to:
1.  Add filters
2.  Do a SELECT DISTINCT
3.  Perform basic aggregation

I get errors like this: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.

Could someone point me to some good guides for querying Hive and/or
assisting my engineers in preventing these errors?
Thanks,


Question about IMPORT/EXPORT

2020-02-10 Thread Thibault VERBEQUE
Hi all,

I'm currently working on two kerberized clusters and want replicate some tables 
between them, with hive 3.0.1.
I have created two users for this, one for EXPORT operation and one for IMPORT 
operation.
But I stepped on https://issues.apache.org/jira/browse/HIVE-17606. It seems to 
me that hiveserver2 is making metastore api calls with UGI of the user 
(doAs=True), I'm I right?
Other remark, looking at 
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-MetastorenotificationAPIsecurity,
 why choosing a proxy user from HDFS to perform this authorization check and 
not set a "hive.cluster.administrator" or something like this ? This doesn't 
make sense to allow an replication user to be allowed to impersonate other 
users.

Regards,

Thibault VERBEQUE.


com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.

2020-02-10 Thread Bernard Quizon
Hi.

We're using Hive 3.0.1 and we're currently experiencing this issue:




















































*Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1581309524541_0094_14_00,
diagnostics=[Vertex vertex_1581309524541_0094_14_00 [Map 1] killed/failed
due to:INIT_FAILURE, Fail to create InputInitializerManager,
org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class
with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGeneratorat
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)at
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)at
org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)at
org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)at
java.security.AccessController.doPrivileged(Native Method)at
javax.security.auth.Subject.doAs(Subject.java:422)at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)at
org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)at
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)at
org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)at
org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)at
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)at
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)at
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)at
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)at
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)at
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)at
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)at
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)at
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)at
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)at
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)at
org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)at
java.lang.Thread.run(Thread.java:745)Caused by:
java.lang.reflect.InvocationTargetExceptionat
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at
java.lang.reflect.Constructor.newInstance(Constructor.java:423)at
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)...
25 moreCaused by: com.google.protobuf.InvalidProtocolBufferException:
Protocol message was too large.  May be malicious.  Use
CodedInputStream.setSizeLimit() to increase the size limit.at
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)at
com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)at
com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)at
com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.(DAGProtos.java:19294)at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.(DAGProtos.java:19258)at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19360)at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19355)at
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)at
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)at
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)at
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.parseFrom(DAGProtos.java:19552)at
org.apache.tez.common.TezUtils.createConfFromByteString(TezUtils.java:116)at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:130)...
30 more]Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1581309524541_0094_14_01, diagnostics=[Vertex received Kill
in NEW state., Vertex vertex_1581309524541_0094_14_01 [Reducer 2]
killed/failed due 

Re: Is there any way to find Hive query to Datanucleus queries mapping

2020-02-10 Thread Zoltan Haindrich

Hey Chinna!

I don't think a mapping like that is easy to get...I would rather try to narrow 
down to a single call which consumes most of the time.
There is a log message which can help you get to the most relevant metastore 
call:
https://github.com/apache/hive/blob/0d9deba3c15038df4c64ea9b8494d554eb8eea2f/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L5405

cheers,
Zoltan

On 2/10/20 1:07 PM, Chinna Rao Lalam wrote:

Hi All,

Is there any way to find Hive query to Datanucleus queries mapping.

"select * from table" this hive query will generate multiple Datanucleus
queries and execute on configured DB.
In our DB some of the queries are running slow, So we want to see
hivequery->datanucleus query mapping to find out which hive query of
datanucleus query is running slow.

If we enable Datanucleus debug log we can see generated queries but not
mapping.

Thanks
Chinna



Is there any way to find Hive query to Datanucleus queries mapping

2020-02-10 Thread Chinna Rao Lalam
Hi All,

Is there any way to find Hive query to Datanucleus queries mapping.

"select * from table" this hive query will generate multiple Datanucleus
queries and execute on configured DB.
In our DB some of the queries are running slow, So we want to see
hivequery->datanucleus query mapping to find out which hive query of
datanucleus query is running slow.

If we enable Datanucleus debug log we can see generated queries but not
mapping.

Thanks
Chinna


com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.

2020-02-10 Thread Bernard Quizon
Hi.

We're using Hive 3.0.1 and we're currently experiencing this issue:




















































*Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Map 1, vertexId=vertex_1581309524541_0094_14_00,
diagnostics=[Vertex vertex_1581309524541_0094_14_00 [Map 1] killed/failed
due to:INIT_FAILURE, Fail to create InputInitializerManager,
org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class
with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator at
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71)
at
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:152)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at
org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4122)
at
org.apache.tez.dag.app.dag.impl.VertexImpl.access$3100(VertexImpl.java:207)
at
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2932)
at
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2879)
at
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2861)
at
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1957)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:206)
at
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2317)
at
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2303)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:180)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115) at
java.lang.Thread.run(Thread.java:745)Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 moreCaused by: com.google.protobuf.InvalidProtocolBufferException:
Protocol message was too large.  May be malicious.  Use
CodedInputStream.setSizeLimit() to increase the size limit. at
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at
com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.(DAGProtos.java:19294)
at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.(DAGProtos.java:19258)
at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19360)
at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19355)
at
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at
com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at
org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.parseFrom(DAGProtos.java:19552)
at
org.apache.tez.common.TezUtils.createConfFromByteString(TezUtils.java:116)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:130)
... 30 more]Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1581309524541_0094_14_01, diagnostics=[Vertex received Kill
in NEW state., Vertex