[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=644403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644403 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 31/Aug/21 15:20 Start Date: 31/Aug/21 15:20 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #2479: URL: https://github.com/apache/hive/pull/2479#issuecomment-908894870 LGTM +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 644403) Time Spent: 4h 50m (was: 4h 40m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=644230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644230 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 31/Aug/21 15:01 Start Date: 31/Aug/21 15:01 Worklog Time Spent: 10m Work Description: abstractdog merged pull request #2479: URL: https://github.com/apache/hive/pull/2479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 644230) Time Spent: 4h 40m (was: 4.5h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=643895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643895 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 31/Aug/21 07:12 Start Date: 31/Aug/21 07:12 Worklog Time Spent: 10m Work Description: abstractdog merged pull request #2479: URL: https://github.com/apache/hive/pull/2479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 643895) Time Spent: 4.5h (was: 4h 20m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=643855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643855 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 31/Aug/21 04:45 Start Date: 31/Aug/21 04:45 Worklog Time Spent: 10m Work Description: maheshk114 commented on pull request #2479: URL: https://github.com/apache/hive/pull/2479#issuecomment-908894870 LGTM +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 643855) Time Spent: 4h 20m (was: 4h 10m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642753 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 27/Aug/21 09:20 Start Date: 27/Aug/21 09:20 Worklog Time Spent: 10m Work Description: abstractdog closed pull request #1122: URL: https://github.com/apache/hive/pull/1122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642753) Time Spent: 4h 10m (was: 4h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4h 10m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642304 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:25 Start Date: 26/Aug/21 12:25 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696577856 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null Review comment: this part has been reworked a bit, created a nested loop and put it into a separate method -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642304) Time Spent: 3h 50m (was: 3h 40m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 50m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642305 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:25 Start Date: 26/Aug/21 12:25 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696577856 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null Review comment: this part has been reworked a bit, created a nested loop and put it into a separate method (collectDataFromParquetPage) + added comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642305) Time Spent: 4h (was: 3h 50m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 4h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642303 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:24 Start Date: 26/Aug/21 12:24 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696577326 ## File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q ## @@ -0,0 +1,20 @@ +set hive.mapred.mode=nonstrict; +set hive.vectorized.execution.enabled=true; +set hive.fetch.task.conversion=none; + +DROP TABLE parquet_map_type; + + +CREATE TABLE parquet_map_type ( +id int, +stringMap map Review comment: finally I decided to not block this because of some testcases, I created follow-up tickets: HIVE-25459, HIVE-25484 this patch is already solves customer's issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642303) Time Spent: 3h 40m (was: 3.5h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642301 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:22 Start Date: 26/Aug/21 12:22 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696576391 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector cv1, BytesColumnVecto int length2 = cv2.vector.length; if (length1 == length2) { for (int i = 0; i < length1; i++) { +if (cv1.vector[i] == null && cv2.vector[i] == null) { + continue; Review comment: okay, I got in the meantime, simply removing null check here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642301) Time Spent: 3.5h (was: 3h 20m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=642300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642300 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 26/Aug/21 12:19 Start Date: 26/Aug/21 12:19 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r696573989 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector cv1, BytesColumnVecto int length2 = cv2.vector.length; if (length1 == length2) { for (int i = 0; i < length1; i++) { +if (cv1.vector[i] == null && cv2.vector[i] == null) { + continue; Review comment: I think I got it, it will be much faster for non-null strings: ``` int innerLen1 = cv1.vector[i].length; int innerLen2 = cv2.vector[i].length; if (innerLen1 == innerLen2) { for (int j = 0; j < innerLen1; j++) { if (cv1.vector[i][j] != cv2.vector[i][j]) { return false; } } } else { return false; } if (cv1.isNull[i] && cv2.isNull[i]) { continue; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642300) Time Spent: 3h 20m (was: 3h 10m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641392 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 24/Aug/21 23:45 Start Date: 24/Aug/21 23:45 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r695289418 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; +lcv.noNulls = false; + } elements.add(lastValue); } while (fetchNextValue(category) && (repetitionLevel != 0)); -lcv.isNull[index] = false; lcv.lengths[index] = elements.size() - lcv.offsets[index]; Review comment: good catch, I'm removing the assignment in the loop, because this outer assignment is valid under all circumstances -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641392) Time Spent: 3h 10m (was: 3h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=641387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641387 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 24/Aug/21 23:40 Start Date: 24/Aug/21 23:40 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r695287365 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { Review comment: the original intention here was to sign if there is a NULL value instead of a list, which is happens in definitionLevel == 0, I'll change this part and add some comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 641387) Time Spent: 3h (was: 2h 50m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 3h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=639219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-639219 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 18/Aug/21 16:31 Start Date: 18/Aug/21 16:31 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690492371 ## File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q ## @@ -0,0 +1,20 @@ +set hive.mapred.mode=nonstrict; +set hive.vectorized.execution.enabled=true; +set hive.fetch.task.conversion=none; + +DROP TABLE parquet_map_type; + + +CREATE TABLE parquet_map_type ( +id int, +stringMap map Review comment: this is blocked because my testcase fails with another issue I discovered: HIVE-25459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 639219) Time Spent: 2h 50m (was: 2h 40m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638709 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 15:38 Start Date: 17/Aug/21 15:38 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690492371 ## File path: ql/src/test/queries/clientpositive/parquet_map_null_vectorization.q ## @@ -0,0 +1,20 @@ +set hive.mapred.mode=nonstrict; +set hive.vectorized.execution.enabled=true; +set hive.fetch.task.conversion=none; + +DROP TABLE parquet_map_type; + + +CREATE TABLE parquet_map_type ( +id int, +stringMap map Review comment: this is blocked because my testcase fails with another issue I discovered: HIVE-25459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638709) Time Spent: 2h 40m (was: 2.5h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638705 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 15:22 Start Date: 17/Aug/21 15:22 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690475398 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -323,7 +341,11 @@ private void fillColumnVector(PrimitiveObjectInspector.PrimitiveCategory categor int scale = logicalType.getScale(); lcv.child = new DecimalColumnVector(total, precision, scale); for (int i = 0; i < valueList.size(); i++) { -((DecimalColumnVector) lcv.child).vector[i].set(((List) valueList).get(i), scale); +if (valueList.get(i) == null) { + lcv.child.isNull[i] = true; Review comment: right, also compareDecimalColumnVector lacks some inner parentheses here I guess :) I'll fix it ``` if (cv1.vector[i] != null && cv2.vector[i] == null || cv1.vector[i] == null && cv2.vector[i] != null || cv1.vector[i] != null && cv2.vector[i] != null && !cv1.vector[i].equals(cv2.vector[i])) { ``` needs to be ((... && ...) || (... && ...) || (... && ... && ...)) without parentheses it is a bit messy/undefined for me... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638705) Time Spent: 2.5h (was: 2h 20m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638702 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 15:20 Start Date: 17/Aug/21 15:20 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690475398 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -323,7 +341,11 @@ private void fillColumnVector(PrimitiveObjectInspector.PrimitiveCategory categor int scale = logicalType.getScale(); lcv.child = new DecimalColumnVector(total, precision, scale); for (int i = 0; i < valueList.size(); i++) { -((DecimalColumnVector) lcv.child).vector[i].set(((List) valueList).get(i), scale); +if (valueList.get(i) == null) { + lcv.child.isNull[i] = true; Review comment: right, also compareDecimalColumnVector lacks some inner parentheses here I guess :) I'll fix it ``` if (cv1.vector[i] != null && cv2.vector[i] == null || cv1.vector[i] == null && cv2.vector[i] != null || cv1.vector[i] != null && cv2.vector[i] != null && !cv1.vector[i].equals(cv2.vector[i])) { ``` needs to be ((... && ...) || (... && ...) || (... && ... && ...))...without parentheses it is a bit messy/undefined for me... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638702) Time Spent: 2h 20m (was: 2h 10m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638699 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 15:13 Start Date: 17/Aug/21 15:13 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690469208 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; Review comment: fetch next value only returns a single primitive value into a list (e.g. ListColumnVector.child.vector[i]) in this method, the elements of lists are only collected serially into List elements, which will then be filled back into a ListColumnVector in convertValueListToListColumnVector -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638699) Time Spent: 2h 10m (was: 2h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638673 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 14:24 Start Date: 17/Aug/21 14:24 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690422532 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -193,59 +190,59 @@ private List decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate case SHORT: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readInteger(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readInteger(intList.get(i))); } break; case DATE: case INTERVAL_YEAR_MONTH: case LONG: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readLong(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readLong(intList.get(i))); } break; case BOOLEAN: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0); +resultList.add(intList.get(i) == null ? null : dictionary.readBoolean(intList.get(i))); Review comment: good catch, I'm fixing it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638673) Time Spent: 2h (was: 1h 50m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 2h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638672 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 14:23 Start Date: 17/Aug/21 14:23 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690422532 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -193,59 +190,59 @@ private List decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate case SHORT: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readInteger(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readInteger(intList.get(i))); } break; case DATE: case INTERVAL_YEAR_MONTH: case LONG: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readLong(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readLong(intList.get(i))); } break; case BOOLEAN: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0); +resultList.add(intList.get(i) == null ? null : dictionary.readBoolean(intList.get(i))); Review comment: good catch, fixed it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638672) Time Spent: 1h 50m (was: 1h 40m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=638667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638667 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 17/Aug/21 14:08 Start Date: 17/Aug/21 14:08 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r690407521 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -479,6 +501,9 @@ private boolean compareBytesColumnVector(BytesColumnVector cv1, BytesColumnVecto int length2 = cv2.vector.length; if (length1 == length2) { for (int i = 0; i < length1; i++) { +if (cv1.vector[i] == null && cv2.vector[i] == null) { + continue; Review comment: sorry, I don't get this, how do you mean? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638667) Time Spent: 1h 40m (was: 1.5h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: László Bodor >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=636829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636829 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 11/Aug/21 09:21 Start Date: 11/Aug/21 09:21 Worklog Time Spent: 10m Work Description: maheshk114 commented on a change in pull request #2479: URL: https://github.com/apache/hive/pull/2479#discussion_r686623069 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; +lcv.noNulls = false; + } elements.add(lastValue); } while (fetchNextValue(category) && (repetitionLevel != 0)); -lcv.isNull[index] = false; lcv.lengths[index] = elements.size() - lcv.offsets[index]; Review comment: lcv.lengths[index] is over written ..in the loop for some condition its set to 0 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { +lcv.lengths[index] = 0; +lcv.isNull[index] = true; Review comment: why this has to be done in a loop ? ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -193,59 +190,59 @@ private List decodeDictionaryIds(PrimitiveObjectInspector.PrimitiveCategory cate case SHORT: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readInteger(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readInteger(intList.get(i))); } break; case DATE: case INTERVAL_YEAR_MONTH: case LONG: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readLong(intList.get(i))); +resultList.add(intList.get(i) == null ? null : dictionary.readLong(intList.get(i))); } break; case BOOLEAN: resultList = new ArrayList(total); for (int i = 0; i < total; ++i) { -resultList.add(dictionary.readBoolean(intList.get(i)) ? 1 : 0); +resultList.add(intList.get(i) == null ? null : dictionary.readBoolean(intList.get(i))); Review comment: instead of 0 or 1 ..value returned by readBoolean is used ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java ## @@ -129,21 +131,16 @@ private boolean fetchNextValue(PrimitiveObjectInspector.PrimitiveCategory catego private void addElement(ListColumnVector lcv, List elements, PrimitiveObjectInspector.PrimitiveCategory category, int index) throws IOException { lcv.offsets[index] = elements.size(); -// Return directly if last value is null -if (definitionLevel < maxDefLevel) { - lcv.isNull[index] = true; - lcv.lengths[index] = 0; - // fetch the data from parquet data page for next call - fetchNextValue(category); - return; -} - do { // add all data for an element in ListColumnVector, get out the loop if there is no data or the data is for new element + if (definitionLevel < maxDefLevel) { Review comment: in fetchNextvalue ..if (definitionLevel != maxDefLevel) {
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=622875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622875 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 15/Jul/21 06:47 Start Date: 15/Jul/21 06:47 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #2479: URL: https://github.com/apache/hive/pull/2479#issuecomment-880441673 this is the rebased version of https://github.com/apache/hive/pull/1122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 622875) Time Spent: 1h 20m (was: 1h 10m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=622874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622874 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 15/Jul/21 06:47 Start Date: 15/Jul/21 06:47 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request #2479: URL: https://github.com/apache/hive/pull/2479 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 622874) Time Spent: 1h 10m (was: 1h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=618257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618257 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 02/Jul/21 20:40 Start Date: 02/Jul/21 20:40 Worklog Time Spent: 10m Work Description: abstractdog edited a comment on pull request #1122: URL: https://github.com/apache/hive/pull/1122#issuecomment-873244798 this change solved an issue that we found on customer side, reopened this PR and I'll review later @SparksFyz could you please rebase the patch on master? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 618257) Time Spent: 1h (was: 50m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 1h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=618256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618256 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 02/Jul/21 20:36 Start Date: 02/Jul/21 20:36 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #1122: URL: https://github.com/apache/hive/pull/1122#issuecomment-873244798 this change solved an issue that we found on customer side, reopened this PR and I'll review later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 618256) Time Spent: 50m (was: 40m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 50m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=618255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618255 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 02/Jul/21 20:36 Start Date: 02/Jul/21 20:36 Worklog Time Spent: 10m Work Description: SparksFyz opened a new pull request #1122: URL: https://github.com/apache/hive/pull/1122 https://issues.apache.org/jira/browse/HIVE-23688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 618255) Time Spent: 40m (was: 0.5h) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 40m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=480516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480516 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 09/Sep/20 00:46 Start Date: 09/Sep/20 00:46 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1122: URL: https://github.com/apache/hive/pull/1122 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 480516) Time Spent: 0.5h (was: 20m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=476931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476931 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 01/Sep/20 00:46 Start Date: 01/Sep/20 00:46 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1122: URL: https://github.com/apache/hive/pull/1122#issuecomment-684124163 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 476931) Time Spent: 20m (was: 10m) > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0, 4.0.0 > > Attachments: HIVE-23688.patch > > Time Spent: 20m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at
[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value
[ https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=446387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446387 ] ASF GitHub Bot logged work on HIVE-23688: - Author: ASF GitHub Bot Created on: 16/Jun/20 09:42 Start Date: 16/Jun/20 09:42 Worklog Time Spent: 10m Work Description: SparksFyz opened a new pull request #1122: URL: https://github.com/apache/hive/pull/1122 …d null values in map ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 446387) Remaining Estimate: 0h Time Spent: 10m > Vectorization: IndexArrayOutOfBoundsException For map type column which > includes null value > --- > > Key: HIVE-23688 > URL: https://issues.apache.org/jira/browse/HIVE-23688 > Project: Hive > Issue Type: Bug > Components: Parquet, storage-api, Vectorization >Affects Versions: All Versions >Reporter: 范宜臻 >Assignee: 范宜臻 >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays > in MapColumnVector.values(BytesColumnVector) when values in map contain > {color:#de350b}null{color} > reproduce in master branch: > {code:java} > set hive.vectorized.execution.enabled=true; > CREATE TABLE parquet_map_type (id int,stringMap map) > stored as parquet; > insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', > 'bar'); > select id, stringMap['k1'] from parquet_map_type group by 1,2; > {code} > query explain: > {code:java} > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_12] > Group By Operator [GBY_11] (rows=5 width=2) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_10] > PartitionCols:_col0, _col1 > Group By Operator [GBY_9] (rows=10 width=2) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_8] (rows=10 width=2) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=10 width=2) > > temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"] > {code} > runtime error: > {code:java} > Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, > diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at >