[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=644403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644403
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:20
Start Date: 31/Aug/21 15:20
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2479:
URL: https://github.com/apache/hive/pull/2479#issuecomment-908894870


   LGTM +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644403)
Time Spent: 4h 50m  (was: 4h 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=644230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644230
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:01
Start Date: 31/Aug/21 15:01
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2479:
URL: https://github.com/apache/hive/pull/2479


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644230)
Time Spent: 4h 40m  (was: 4.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=643895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643895
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 07:12
Start Date: 31/Aug/21 07:12
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2479:
URL: https://github.com/apache/hive/pull/2479


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643895)
Time Spent: 4.5h  (was: 4h 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=643855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643855
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 04:45
Start Date: 31/Aug/21 04:45
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2479:
URL: https://github.com/apache/hive/pull/2479#issuecomment-908894870


   LGTM +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643855)
Time Spent: 4h 20m  (was: 4h 10m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642753
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 27/Aug/21 09:20
Start Date: 27/Aug/21 09:20
Worklog Time Spent: 10m 
  Work Description: abstractdog closed pull request #1122:
URL: https://github.com/apache/hive/pull/1122


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642753)
Time Spent: 4h 10m  (was: 4h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642304
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:25
Start Date: 26/Aug/21 12:25
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696577856



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null

Review comment:
   this part has been reworked a bit, created a nested loop and put it into 
a separate method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642304)
Time Spent: 3h 50m  (was: 3h 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642305
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:25
Start Date: 26/Aug/21 12:25
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696577856



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null

Review comment:
   this part has been reworked a bit, created a nested loop and put it into 
a separate method (collectDataFromParquetPage) + added comments 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642305)
Time Spent: 4h  (was: 3h 50m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642303
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:24
Start Date: 26/Aug/21 12:24
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696577326



##
File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q
##
@@ -0,0 +1,20 @@
+set hive.mapred.mode=nonstrict;
+set hive.vectorized.execution.enabled=true;
+set hive.fetch.task.conversion=none;
+
+DROP TABLE parquet_map_type;
+
+
+CREATE TABLE parquet_map_type (
+id int,
+stringMap map

Review comment:
   finally I decided to not block this because of some testcases, I created 
follow-up tickets: HIVE-25459, HIVE-25484
   this patch is already solves customer's issue




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642303)
Time Spent: 3h 40m  (was: 3.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642301
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:22
Start Date: 26/Aug/21 12:22
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696576391



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector 
cv1, BytesColumnVecto
 int length2 = cv2.vector.length;
 if (length1 == length2) {
   for (int i = 0; i < length1; i++) {
+if (cv1.vector[i] == null && cv2.vector[i] == null) {
+  continue;

Review comment:
   okay, I got in the meantime, simply removing null check here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642301)
Time Spent: 3.5h  (was: 3h 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642300
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 26/Aug/21 12:19
Start Date: 26/Aug/21 12:19
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r696573989



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector 
cv1, BytesColumnVecto
 int length2 = cv2.vector.length;
 if (length1 == length2) {
   for (int i = 0; i < length1; i++) {
+if (cv1.vector[i] == null && cv2.vector[i] == null) {
+  continue;

Review comment:
   I think I got it, it will be much faster for non-null strings:
   ```
   int innerLen1 = cv1.vector[i].length;
   int innerLen2 = cv2.vector[i].length;
   if (innerLen1 == innerLen2) {
 for (int j = 0; j < innerLen1; j++) {
   if (cv1.vector[i][j] != cv2.vector[i][j]) {
 return false;
   }
 }
   } else {
 return false;
   }
   
   if (cv1.isNull[i] && cv2.isNull[i]) {
 continue;
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642300)
Time Spent: 3h 20m  (was: 3h 10m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641392
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 23:45
Start Date: 24/Aug/21 23:45
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r695289418



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;
+lcv.noNulls = false;
+  }
   elements.add(lastValue);
 } while (fetchNextValue(category) && (repetitionLevel != 0));
 
-lcv.isNull[index] = false;
 lcv.lengths[index] = elements.size() - lcv.offsets[index];

Review comment:
   good catch, I'm removing the assignment in the loop, because this outer 
assignment is valid under all circumstances




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641392)
Time Spent: 3h 10m  (was: 3h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641387
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 24/Aug/21 23:40
Start Date: 24/Aug/21 23:40
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r695287365



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {

Review comment:
   the original intention here was to sign if there is a NULL value instead 
of a list, which is happens in definitionLevel == 0, I'll change this part and 
add some comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 641387)
Time Spent: 3h  (was: 2h 50m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=639219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-639219
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 18/Aug/21 16:31
Start Date: 18/Aug/21 16:31
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690492371



##
File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q
##
@@ -0,0 +1,20 @@
+set hive.mapred.mode=nonstrict;
+set hive.vectorized.execution.enabled=true;
+set hive.fetch.task.conversion=none;
+
+DROP TABLE parquet_map_type;
+
+
+CREATE TABLE parquet_map_type (
+id int,
+stringMap map

Review comment:
   this is blocked because my testcase fails with another issue I 
discovered: HIVE-25459




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 639219)
Time Spent: 2h 50m  (was: 2h 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638709
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 15:38
Start Date: 17/Aug/21 15:38
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690492371



##
File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q
##
@@ -0,0 +1,20 @@
+set hive.mapred.mode=nonstrict;
+set hive.vectorized.execution.enabled=true;
+set hive.fetch.task.conversion=none;
+
+DROP TABLE parquet_map_type;
+
+
+CREATE TABLE parquet_map_type (
+id int,
+stringMap map

Review comment:
   this is blocked because my testcase fails with another issue I 
discovered: HIVE-25459




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638709)
Time Spent: 2h 40m  (was: 2.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638705
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 15:22
Start Date: 17/Aug/21 15:22
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690475398



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -323,7 +341,11 @@ private void 
fillColumnVector(PrimitiveObjectInspector.PrimitiveCategory categor
   int scale = logicalType.getScale();
   lcv.child = new DecimalColumnVector(total, precision, scale);
   for (int i = 0; i < valueList.size(); i++) {
-((DecimalColumnVector) lcv.child).vector[i].set(((List) 
valueList).get(i), scale);
+if (valueList.get(i) == null) {
+  lcv.child.isNull[i] = true;

Review comment:
   right, also compareDecimalColumnVector lacks some inner parentheses here 
I guess :) I'll fix it
   ```
   if (cv1.vector[i] != null && cv2.vector[i] == null
   || cv1.vector[i] == null && cv2.vector[i] != null
   || cv1.vector[i] != null && cv2.vector[i] != null && 
!cv1.vector[i].equals(cv2.vector[i])) {
   ```
   needs to be ((... && ...) || (... && ...) || (... && ... && ...))
   
   without parentheses it is a bit messy/undefined for me...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638705)
Time Spent: 2.5h  (was: 2h 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638702
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 15:20
Start Date: 17/Aug/21 15:20
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690475398



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -323,7 +341,11 @@ private void 
fillColumnVector(PrimitiveObjectInspector.PrimitiveCategory categor
   int scale = logicalType.getScale();
   lcv.child = new DecimalColumnVector(total, precision, scale);
   for (int i = 0; i < valueList.size(); i++) {
-((DecimalColumnVector) lcv.child).vector[i].set(((List) 
valueList).get(i), scale);
+if (valueList.get(i) == null) {
+  lcv.child.isNull[i] = true;

Review comment:
   right, also compareDecimalColumnVector lacks some inner parentheses here 
I guess :) I'll fix it
   ```
   if (cv1.vector[i] != null && cv2.vector[i] == null
   || cv1.vector[i] == null && cv2.vector[i] != null
   || cv1.vector[i] != null && cv2.vector[i] != null && 
!cv1.vector[i].equals(cv2.vector[i])) {
   ```
   needs to be ((... && ...) || (... && ...) || (... && ... && ...))...without 
parentheses it is a bit messy/undefined for me...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638702)
Time Spent: 2h 20m  (was: 2h 10m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638699
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 15:13
Start Date: 17/Aug/21 15:13
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690469208



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;

Review comment:
   fetch next value only returns a single primitive value into a list (e.g. 
ListColumnVector.child.vector[i])
   in this method, the elements of lists are only collected serially into 
List elements, which will then be filled back into a ListColumnVector 
in convertValueListToListColumnVector




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638699)
Time Spent: 2h 10m  (was: 2h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638673
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 14:24
Start Date: 17/Aug/21 14:24
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690422532



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -193,59 +190,59 @@ private List 
decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate
 case SHORT:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readInteger(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readInteger(intList.get(i)));
   }
   break;
 case DATE:
 case INTERVAL_YEAR_MONTH:
 case LONG:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readLong(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readLong(intList.get(i)));
   }
   break;
 case BOOLEAN:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0);
+resultList.add(intList.get(i) == null ? null : 
dictionary.readBoolean(intList.get(i)));

Review comment:
   good catch, I'm fixing it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638673)
Time Spent: 2h  (was: 1h 50m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638672
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 14:23
Start Date: 17/Aug/21 14:23
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690422532



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -193,59 +190,59 @@ private List 
decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate
 case SHORT:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readInteger(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readInteger(intList.get(i)));
   }
   break;
 case DATE:
 case INTERVAL_YEAR_MONTH:
 case LONG:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readLong(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readLong(intList.get(i)));
   }
   break;
 case BOOLEAN:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0);
+resultList.add(intList.get(i) == null ? null : 
dictionary.readBoolean(intList.get(i)));

Review comment:
   good catch, fixed it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638672)
Time Spent: 1h 50m  (was: 1h 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638667
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 17/Aug/21 14:08
Start Date: 17/Aug/21 14:08
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r690407521



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector 
cv1, BytesColumnVecto
 int length2 = cv2.vector.length;
 if (length1 == length2) {
   for (int i = 0; i < length1; i++) {
+if (cv1.vector[i] == null && cv2.vector[i] == null) {
+  continue;

Review comment:
   sorry, I don't get this, how do you mean?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638667)
Time Spent: 1h 40m  (was: 1.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=636829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636829
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 11/Aug/21 09:21
Start Date: 11/Aug/21 09:21
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on a change in pull request #2479:
URL: https://github.com/apache/hive/pull/2479#discussion_r686623069



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;
+lcv.noNulls = false;
+  }
   elements.add(lastValue);
 } while (fetchNextValue(category) && (repetitionLevel != 0));
 
-lcv.isNull[index] = false;
 lcv.lengths[index] = elements.size() - lcv.offsets[index];

Review comment:
   lcv.lengths[index]  is over written ..in the loop for some condition its 
set to 0 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {
+lcv.lengths[index] = 0;
+lcv.isNull[index] = true;

Review comment:
   why this has to be done in a loop ? 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -193,59 +190,59 @@ private List 
decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate
 case SHORT:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readInteger(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readInteger(intList.get(i)));
   }
   break;
 case DATE:
 case INTERVAL_YEAR_MONTH:
 case LONG:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readLong(intList.get(i)));
+resultList.add(intList.get(i) == null ? null : 
dictionary.readLong(intList.get(i)));
   }
   break;
 case BOOLEAN:
   resultList = new ArrayList(total);
   for (int i = 0; i < total; ++i) {
-resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0);
+resultList.add(intList.get(i) == null ? null : 
dictionary.readBoolean(intList.get(i)));

Review comment:
   instead of 0 or 1 ..value returned by readBoolean is used 

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
##
@@ -129,21 +131,16 @@ private boolean 
fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego
   private void addElement(ListColumnVector lcv, List elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
 lcv.offsets[index] = elements.size();
 
-// Return directly if last value is null
-if (definitionLevel < maxDefLevel) {
-  lcv.isNull[index] = true;
-  lcv.lengths[index] = 0;
-  // fetch the data from parquet data page for next call
-  fetchNextValue(category);
-  return;
-}
-
 do {
   // add all data for an element in ListColumnVector, get out the loop if 
there is no data or the data is for new element
+  if (definitionLevel < maxDefLevel) {

Review comment:
   in fetchNextvalue ..if (definitionLevel != maxDefLevel) { 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=622875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622875
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 06:47
Start Date: 15/Jul/21 06:47
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2479:
URL: https://github.com/apache/hive/pull/2479#issuecomment-880441673


   this is the rebased version of https://github.com/apache/hive/pull/1122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622875)
Time Spent: 1h 20m  (was: 1h 10m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=622874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622874
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 15/Jul/21 06:47
Start Date: 15/Jul/21 06:47
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #2479:
URL: https://github.com/apache/hive/pull/2479


   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 622874)
Time Spent: 1h 10m  (was: 1h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=618257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618257
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 02/Jul/21 20:40
Start Date: 02/Jul/21 20:40
Worklog Time Spent: 10m 
  Work Description: abstractdog edited a comment on pull request #1122:
URL: https://github.com/apache/hive/pull/1122#issuecomment-873244798


   this change solved an issue that we found on customer side, reopened this PR 
and I'll review later
   @SparksFyz could you please rebase the patch on master?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618257)
Time Spent: 1h  (was: 50m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=618256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618256
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 02/Jul/21 20:36
Start Date: 02/Jul/21 20:36
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #1122:
URL: https://github.com/apache/hive/pull/1122#issuecomment-873244798


   this change solved an issue that we found on customer side, reopened this PR 
and I'll review later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618256)
Time Spent: 50m  (was: 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=618255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618255
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 02/Jul/21 20:36
Start Date: 02/Jul/21 20:36
Worklog Time Spent: 10m 
  Work Description: SparksFyz opened a new pull request #1122:
URL: https://github.com/apache/hive/pull/1122


   https://issues.apache.org/jira/browse/HIVE-23688


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 618255)
Time Spent: 40m  (was: 0.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=480516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480516
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:46
Start Date: 09/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1122:
URL: https://github.com/apache/hive/pull/1122


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480516)
Time Spent: 0.5h  (was: 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=476931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476931
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 01/Sep/20 00:46
Start Date: 01/Sep/20 00:46
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1122:
URL: https://github.com/apache/hive/pull/1122#issuecomment-684124163


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476931)
Time Spent: 20m  (was: 10m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2020-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=446387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446387
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 16/Jun/20 09:42
Start Date: 16/Jun/20 09:42
Worklog Time Spent: 10m 
  Work Description: SparksFyz opened a new pull request #1122:
URL: https://github.com/apache/hive/pull/1122


   …d null values in map
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 446387)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: 范宜臻
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
>