Re: JOIN on map value results to HiveException: Unexpected column vector type MAP
Hi, After tweaking the configs, I found out that "hive.vectorized.execution.enabled" and "hive.auto.convert.join" configs are the culprit. I think vectorization on map column data type is not supported in my current Hive version. Also, the Map Join is having problems on the map data type. So, after setting atleast one of these configurations to "false", the cross product query successfully runs. Of course, there might be some performance loss since we're turning off the vectorization and Map Join. Regards, Jan Charles On Thu, Jan 17, 2019 at 1:57 PM Jan Adona wrote: > Hi, > > Just a follow up, I think that JOIN is not the problem here since this > error also occurs when I am querying 2 tables even without a join and you > include the map column in the select statement. > > I'm going to rewrite the schema and queries that I've sent before because > I mistakenly formatted the body that's why it has random asterisks. > > *Schema:* > > > > > > > > *CREATE TABLE test_table0(userid BIGINT, mapCol map BIGINT>)COMMENT 'Test table 0'STORED AS SEQUENCEFILE;CREATE TABLE > test_table1(userid BIGINT, col1 STRING, col2 STRING)COMMENT 'Test table > 1'STORED AS SEQUENCEFILE;* > *Rows:* > > > *INSERT INTO TABLE test_table0 VALUES (1, map('a', 1, 'b', 2));INSERT INTO > TABLE test_table0 VALUES (2, map('c', 3, 'd', 4));INSERT INTO TABLE > test_table1 VALUES (1, 'mycol1', 'mycol2');* > > *Query with a JOIN (fail):* > > *SELECT a.*, b.* FROM test_table0 a INNER JOIN test_table1 b > ONa.mapCol['a'] = b.userid;* > > *Query without a JOIN (fail):* > *SELECT a.*, b.* FROM test_table0 a, test_table1 b;* > > *Query without a JOIN, not including the column with the map data type > (success):* > *SELECT a.userid, b.* FROM test_table0 a, test_table1 b;* > > *Error message of the failed queries:* > ERROR : Status: Failed > ERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1546408189013_0179_7_01, diagnostics=[Task failed, > taskId=task_1546408189013_0179_7_01_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1546408189013_0179_7_01_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected > column vector type MAP > at > org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335) > ... 17 more > > Also, I'm running this query on an HDP 3.0.1 cluster with Apache Hive > 3.1.0. > >
Re: JOIN on map value results to HiveException: Unexpected column vector type MAP
Hi, Just a follow up, I think that JOIN is not the problem here since this error also occurs when I am querying 2 tables even without a join and you include the map column in the select statement. I'm going to rewrite the schema and queries that I've sent before because I mistakenly formatted the body that's why it has random asterisks. *Schema:* *CREATE TABLE test_table0(userid BIGINT, mapCol map)COMMENT 'Test table 0'STORED AS SEQUENCEFILE;CREATE TABLE test_table1(userid BIGINT, col1 STRING, col2 STRING)COMMENT 'Test table 1'STORED AS SEQUENCEFILE;* *Rows:* *INSERT INTO TABLE test_table0 VALUES (1, map('a', 1, 'b', 2));INSERT INTO TABLE test_table0 VALUES (2, map('c', 3, 'd', 4));INSERT INTO TABLE test_table1 VALUES (1, 'mycol1', 'mycol2');* *Query with a JOIN (fail):* *SELECT a.*, b.* FROM test_table0 a INNER JOIN test_table1 b ONa.mapCol['a'] = b.userid;* *Query without a JOIN (fail):* *SELECT a.*, b.* FROM test_table0 a, test_table1 b;* *Query without a JOIN, not including the column with the map data type (success):* *SELECT a.userid, b.* FROM test_table0 a, test_table1 b;* *Error message of the failed queries:* ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1546408189013_0179_7_01, diagnostics=[Task failed, taskId=task_1546408189013_0179_7_01_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1546408189013_0179_7_01_00_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected column vector type MAP at org.apache.hadoop.hive.ql.exec.vector.VectorCopyRow.init(VectorCopyRow.java:302) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:419) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.initializeOp(VectorMapJoinGenerateResultOperator.java:115) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:335) ... 17 more Also, I'm running this query on an HDP 3.0.1 cluster with Apache Hive 3.1.0. Thanks, Jan Charles On Wed, Jan 16, 2019 at 7:46 PM Jan Adona wrote: > Hi, > I'm trying out a JOIN query on 2 tables using a map value (which is a > BIGINT) on the 1st table and a BIGINT column on the 2nd table. So here is > the schema: > > *CREATE TABLE test_table0(userid BIGINT, mapCol map)* > *COMMENT 'Test table 0'* > *STORED AS SEQUENCEFILE;* > > *CREATE TABLE test_table1(userid BIGINT, col1 STRING, col2 STRING)* > *COMMENT 'Test table 1'* > *STORED AS SEQUENCEFILE;* > > with these rows: > > *INSERT INTO TABLE test_table0 VALUES (1, map('a', 1, 'b', 2));* > *INSERT INTO TABLE test_table0 VALUES (2, map('c', 3, 'd', 4));* > *INSERT INTO TABLE test_table1 VALUES (1, 'mycol1', 'mycol2');* > > and here is the query: > > *SELECT a.*, b.* FROM test_table0 a INNER