[jira] [Created] (HIVE-18107) CBO Multi Table Insert Query with JOIN operator and GROUPING SETS throws SemanticException Invalid table alias or column reference 'GROUPING__ID'
Sergey Zadoroshnyak created HIVE-18107: -- Summary: CBO Multi Table Insert Query with JOIN operator and GROUPING SETS throws SemanticException Invalid table alias or column reference 'GROUPING__ID' Key: HIVE-18107 URL: https://issues.apache.org/jira/browse/HIVE-18107 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 2.3.0 Reporter: Sergey Zadoroshnyak Assignee: Jesus Camacho Rodriguez Fix For: 3.0.0 hive 2.3.0 set hive.execution.engine=tez; set hive.multigroupby.singlereducer=false; *set hive.cbo.enable=true;* Multi Table Insert Query. *Template:* FROM (SELECT * FROM tableA) AS alias_a JOIN (SELECT * FROM tableB) AS alias_b ON (alias_a.column_1 = alias_b.column_1 AND alias_a.column_2 = alias_b.column_2) INSERT OVERWRITE TABLE tableC PARTITION ( partition1='first_fragment' ) SELECT GROUPING__ID, alias_a.column4, alias_a.column5, alias_a.column6, alias_a.column7, count(1) AS rownum WHERE alias_b.column_3 = 1 GROUP BY alias_a.column4, alias_a.column5, alias_a.column6, alias_a.column7 GROUPING SETS ( (alias_a.column4), (alias_a.column4,alias_a.column5), (alias_a.column4,alias_a.column5,alias_a.column6,alias_a.column7) ) INSERT OVERWRITE TABLE tableC PARTITION ( partition1='second_fragment' ) SELECT GROUPING__ID, alias_a.column4, alias_a.column5, alias_a.column6, alias_a.column7, count(1) AS rownum WHERE alias_b.column_3 = 2 GROUP BY alias_a.column4, alias_a.column5, alias_a.column6, alias_a.column7 GROUPING SETS ( (alias_a.column4), (alias_a.column4,alias_a.column5), (alias_a.column4,alias_a.column5,alias_a.column6,alias_a.column7) ) 16:39:17,822 ERROR CalcitePlanner:423 - CBO failed, skipping CBO. org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:537 Invalid table alias or column reference 'GROUPING__ID': (possible column names are:.. at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11600) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11548) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:3706) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:3999) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1315) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1261) at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1294) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530) at
[jira] [Created] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
Sergey Zadoroshnyak created HIVE-14483: -- Summary: java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays Key: HIVE-14483 URL: https://issues.apache.org/jira/browse/HIVE-14483 Project: Hive Issue Type: Bug Components: ORC Affects Versions: 2.1.0 Reporter: Sergey Zadoroshnyak Assignee: Owen O'Malley Priority: Critical Fix For: 2.2.0 Error message: Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more How to reproduce? Configure StringTreeReader which contains StringDirectTreeReader as TreeReader (DIRECT or DIRECT_V2 column encoding) batchSize = 1026; invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final int batchSize) scratchlcv is LongColumnVector with long[] vector (length 1024) which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, scratchlcv,result, batchSize); as result in method commonReadByteArrays(stream, lengths, scratchlcv, result, (int) batchSize) we received ArrayIndexOutOfBoundsException. If we use StringDictionaryTreeReader, then there is no exception, as we have a verification scratchlcv.ensureSize((int) batchSize, false) before reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); These changes were made for Hive 2.1.0 by corresponding commit https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley How to fix? add only one line : scratchlcv.ensureSize((int) batchSize, false) ; in method org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream stream, IntegerReader lengths, LongColumnVector scratchlcv, BytesColumnVector result, final int batchSize) before invocation lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13951) GenericUDFArray should constant fold at compile time
Sergey Zadoroshnyak created HIVE-13951: -- Summary: GenericUDFArray should constant fold at compile time Key: HIVE-13951 URL: https://issues.apache.org/jira/browse/HIVE-13951 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.3.0, 2.1.0 Reporter: Sergey Zadoroshnyak 1. Hive constant propagation optimizer is enabled. hive.optimize.constant.propagation=true; 2. Hive query: select array('Total','Total') from some_table; ERROR: org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory (ConstantPropagateProcFactory.java:evaluateFunction(939)) - Unable to evaluate org.apache.hadoop.hive.ql.udf.generic.GenericUDFArray@3d26c423. Return value unrecoginizable. Details: During compilation of query, hive checks if any subexpression of a specified expression can be evaluated to be constant and replaces such subexpression with the constant. If the expression is a deterministic UDF and all the subexpressions are constants, the value will be calculated immediately during compilation time (not runtime) So array is a deterministic UDF, 'Total' is string constant. So Hive tries to replace result of evaluation UDF with the constant. But looks like, that Hive only supports primitives and struct objects. So, array is not supported yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13021) GenericUDAFEvaluator.isEstimable(agg) always returns false
Sergey Zadoroshnyak created HIVE-13021: -- Summary: GenericUDAFEvaluator.isEstimable(agg) always returns false Key: HIVE-13021 URL: https://issues.apache.org/jira/browse/HIVE-13021 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.2.1 Reporter: Sergey Zadoroshnyak Assignee: Gopal V GenericUDAFEvaluator.isEstimable(agg) always returns false, because annotation AggregationType has default RetentionPolicy.CLASS and cannot be retained by the VM at run time. As result estimate method will never be executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)