Furcy Pin created HIVE-11933:
--------------------------------

             Summary: transactional table + vectorization + where = bug
                 Key: HIVE-11933
                 URL: https://issues.apache.org/jira/browse/HIVE-11933
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Furcy Pin


We bumped into a bug when using vectorization on a transactional table.
Here is a minimal example :

```sql
DROP TABLE IF EXISTS vectorization_transactional_test ;
CREATE TABLE vectorization_transactional_test (
    id INT
)
CLUSTERED BY (id) into 3 buckets
STORED AS ORC
TBLPROPERTIES('transactional'='true') ;

INSERT INTO TABLE vectorization_transactional_test values 
(1)
;

SET hive.vectorized.execution.enabled=true ;

SELECT
* 
FROM vectorization_transactional_test
WHERE id = 1
;
```

With vectorization enable, the last query will fail with a n 
ArrayOutOfBoundException in the mappers. 
Here is the full stack:

```
FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 1
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:126)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.processOp(VectorFilterOperator.java:111)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
    at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
    at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
    ... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at 
org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluateLong(ConstantVectorExpression.java:102)
    at 
org.apache.hadoop.hive.ql.exec.vector.expressions.ConstantVectorExpression.evaluate(ConstantVectorExpression.java:150)
    at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:124)
    ... 15 more
```

Of course, disabling vectorization (or transactionnality) removes the bug.


More annoyingly, when the table is used in a JOIN, the job doesn't fail but 
returns a wrong result instead :
for instance an empty table, while disabling vectorization returns a non-empty 
one. This behavior is harder to reproduce with a minimal example.

We experienced this bug in version 1.1.0-cdh5.4.2.

I did not achieve to reproduce this bug on a local build of hive 1.2.0
because I did not succeed to have transactionnality working correctly.
I guess it only works in pseudo-distributed mode and not in local mode.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to