[jira] [Created] (HIVE-18107) CBO Multi Table Insert Query with JOIN operator and GROUPING SETS throws SemanticException Invalid table alias or column reference 'GROUPING__ID'

2017-11-20 Thread Sergey Zadoroshnyak (JIRA)
Sergey Zadoroshnyak created HIVE-18107:
--

 Summary: CBO Multi Table Insert Query with JOIN operator and 
GROUPING SETS  throws SemanticException  Invalid table alias or column 
reference 'GROUPING__ID'
 Key: HIVE-18107
 URL: https://issues.apache.org/jira/browse/HIVE-18107
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.3.0
Reporter: Sergey Zadoroshnyak
Assignee: Jesus Camacho Rodriguez
 Fix For: 3.0.0


hive 2.3.0

set hive.execution.engine=tez;
set hive.multigroupby.singlereducer=false;
*set hive.cbo.enable=true;*

Multi Table Insert Query. *Template:*

FROM (SELECT * FROM tableA) AS alias_a JOIN (SELECT * FROM tableB) AS  alias_b 
ON (alias_a.column_1 = alias_b.column_1 AND alias_a.column_2 = alias_b.column_2)
  
  INSERT OVERWRITE TABLE tableC PARTITION
(
  partition1='first_fragment'
)
  SELECT 
GROUPING__ID,
alias_a.column4,
alias_a.column5,
alias_a.column6,
alias_a.column7,
  count(1)  


 AS rownum
  WHERE alias_b.column_3 = 1
  GROUP BY 
alias_a.column4,
alias_a.column5,
alias_a.column6,
alias_a.column7
  GROUPING SETS 
( 
(alias_a.column4),
(alias_a.column4,alias_a.column5), 
(alias_a.column4,alias_a.column5,alias_a.column6,alias_a.column7)
)
 
  INSERT OVERWRITE TABLE tableC PARTITION
(
   partition1='second_fragment'
)
  SELECT 
GROUPING__ID,
alias_a.column4,
alias_a.column5,
alias_a.column6,
alias_a.column7,
count(1)


   AS rownum
  WHERE alias_b.column_3 = 2
  GROUP BY 
alias_a.column4,
alias_a.column5,
alias_a.column6,
alias_a.column7
  GROUPING SETS 
( 
(alias_a.column4),
(alias_a.column4,alias_a.column5), 
(alias_a.column4,alias_a.column5,alias_a.column6,alias_a.column7)
)

16:39:17,822 ERROR CalcitePlanner:423 - CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:537 Invalid table 
alias or column reference 'GROUPING__ID': (possible column names are:..
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11600)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11548)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:3706)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:3999)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1315)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1261)
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1294)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
at 

[jira] [Created] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-09 Thread Sergey Zadoroshnyak (JIRA)
Sergey Zadoroshnyak created HIVE-14483:
--

 Summary:  java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
 Key: HIVE-14483
 URL: https://issues.apache.org/jira/browse/HIVE-14483
 Project: Hive
  Issue Type: Bug
  Components: ORC
Affects Versions: 2.1.0
Reporter: Sergey Zadoroshnyak
Assignee: Owen O'Malley
Priority: Critical
 Fix For: 2.2.0


Error message:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
at 
org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
at 
org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
at 
org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
at 
org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
... 22 more


How to reproduce?
Configure StringTreeReader  which contains StringDirectTreeReader as TreeReader 
(DIRECT or DIRECT_V2 column encoding)

batchSize = 1026;

invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
int batchSize)

scratchlcv is LongColumnVector with long[] vector  (length 1024)

 which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
scratchlcv,result, batchSize);

as result in method commonReadByteArrays(stream, lengths, scratchlcv,
result, (int) batchSize) we received ArrayIndexOutOfBoundsException.


If we use StringDictionaryTreeReader, then there is no exception, as we have a 
verification  scratchlcv.ensureSize((int) batchSize, false) before 
reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);

These changes were made for Hive 2.1.0 by corresponding commit 
https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
 for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley

How to fix?
add  only one line :

scratchlcv.ensureSize((int) batchSize, false) ;

in method 
org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
 stream, IntegerReader lengths,
LongColumnVector scratchlcv,
BytesColumnVector result, final int batchSize) before invocation 
lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);















--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13951) GenericUDFArray should constant fold at compile time

2016-06-06 Thread Sergey Zadoroshnyak (JIRA)
Sergey Zadoroshnyak created HIVE-13951:
--

 Summary: GenericUDFArray should constant fold at compile time
 Key: HIVE-13951
 URL: https://issues.apache.org/jira/browse/HIVE-13951
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 1.3.0, 2.1.0
Reporter: Sergey Zadoroshnyak


1. Hive constant propagation optimizer is enabled.  
hive.optimize.constant.propagation=true;
2. Hive query: 
select array('Total','Total') from some_table;

ERROR: org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory 
(ConstantPropagateProcFactory.java:evaluateFunction(939)) - Unable to evaluate 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFArray@3d26c423. Return value 
unrecoginizable.

Details:
During compilation of query, hive checks if any subexpression of a specified 
expression can be evaluated to be constant and replaces such subexpression with 
the constant.
If the expression is a deterministic UDF and all the subexpressions are 
constants, the value will be calculated immediately during compilation time 
(not runtime)

So array is a deterministic UDF,  'Total' is string constant. So Hive tries to 
replace result of evaluation UDF with the constant.

But looks like, that Hive only supports primitives and struct objects.

So, array is not supported yet.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13021) GenericUDAFEvaluator.isEstimable(agg) always returns false

2016-02-08 Thread Sergey Zadoroshnyak (JIRA)
Sergey Zadoroshnyak created HIVE-13021:
--

 Summary: GenericUDAFEvaluator.isEstimable(agg) always returns false
 Key: HIVE-13021
 URL: https://issues.apache.org/jira/browse/HIVE-13021
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 1.2.1
Reporter: Sergey Zadoroshnyak
Assignee: Gopal V


GenericUDAFEvaluator.isEstimable(agg) always returns false, because annotation 
AggregationType has default RetentionPolicy.CLASS and cannot be retained by the 
VM at run time.
As result estimate method will never be executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)