Mustafa İman created HIVE-24531: ----------------------------------- Summary: Vectorized table scan ignores binary column Key: HIVE-24531 URL: https://issues.apache.org/jira/browse/HIVE-24531 Project: Hive Issue Type: Bug Reporter: Mustafa İman
There is a binary field in over1k dataset in hive codebase. Vectorized table scan ignores binary field and passes as null in all rows. The issue affects insert queries too with external tables and managed tables when "hive.stats.autogather=false". To reproduce: Add "set hive.stats.autogather=false;" on top of "vector_data_types.q" Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q" Observe that "bin" column is all NULL when querying any of the tables. Below is a simplified version of the same test: {code:java} set hive.mapred.mode=nonstrict; set hive.explain.user=false; set hive.fetch.task.conversion=none; set hive.stats.autogather=false; DROP TABLE over1k_n8; DROP TABLE over1korc_n1; -- data setup CREATE TABLE over1k_n8(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE over1k_n8; analyze table over1k_n8 compute statistics; analyze table over1k_n8 compute statistics for columns; select * from over1k_n8 limit 10; select count(1) from over1k_n8 where bin is null; CREATE TABLE over1korc_n1(t tinyint, si smallint, i int, b bigint, f float, d double, bo boolean, s string, ts timestamp, `dec` decimal(4,2), bin binary) STORED AS ORC; explain vectorization detail INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8; INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8; select count(1) from over1korc_n1 where bin is null; select * from over1korc_n1 limit 10; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)