Suresh Subbiah created TRAFODION-2138: -----------------------------------------
Summary: Hive scan on wide tables can result in lost rows or error Key: TRAFODION-2138 URL: https://issues.apache.org/jira/browse/TRAFODION-2138 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 2.0-incubating Reporter: Suresh Subbiah Assignee: Suresh Subbiah Fix For: 2.0-incubating When a Hive table has wide rows (say greater than 16KB), and the scan is parallelized. It is possible that one one these two problem situations may arise a) An error with sqlcode 8446 is raised *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No record delimiter found in buffer from hdfsRead. b) some rows are lost. Both these situations occur since the scan is broken up into ranges and the lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which defaults to 16KB. This setting is also hardcoded in a crucial location so changing the default does not help. Test case to see this problem is cqd traf_max_character_col_length '1000000'; drop table if exists t031t1; create table t031t1 (z int not null primary key, a varchar(1000000), b char(1000000)); insert into t031t1 values (1, repeat('a', 1000000, 1000000) , 'def'); insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); insert into t031t1 values (3, repeat('a', 10000, 10000) , 'zzz'); insert into t031t1 values (4, repeat('a', 100000, 100000) , 'zzz'); insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); insert into t031t1 values (8, repeat('a', 1000000, 1000000) , null); insert into t031t1 values (9, repeat('a', 500000, 500000) , null); insert into t031t1 values (10, repeat('a', 100, 100) , null); --in hive create table t031hive(z int, a string, b string); -- in sqlci cqd hive_max_string_length '1000000'; insert into hive.hive.t031hive select * from t031t1; create external table t031hive (z int, a varchar(1000000), b varchar(1000000)) for hive.hive.t031hive; control query shape sort_groupby(exchange(cut)); -- 2 esp plan cqd hive_min_bytes_per_esp_partition '2000000' ; prepare s2 from select count(*) from hive.hive.t031hive; explain options 'f' s2 ; execute s2 ; cqd hive_min_bytes_per_esp_partition '1000000' ; -- 4 esp plan prepare s2 from select count(*) from hive.hive.t031hive; explain options 'f' s2 ; execute s2 ; The row boundaries and esp scan rage boundaries are shown below hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" -- 4 esp plan 1000006:\n esp0 end (652820) 1001013:\n 1011020:\n 1111027:\n 1111134:\n 1111151:\n 1111159:\n 2111165:\n esp1 end (1305640) esp2 end (1958460) 2611171:\n 2611278:\n -- 2 esp plan 1000006:\n 1001013:\n 1011020:\n 1111027:\n 1111134:\n 1111151:\n 1111159:\n 2111165:\n esp0 end (1305640) 2611171:\n 2611278:\n umber of esps to scan: 4 Esp# Range# StartOffset BytesRead FileName ====== ====== ============ ============ ============================== 0 0 0 652820 t031hive/T031HIVE-0-20160716231120-1 (split_e) 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 (split_b/e) 2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1 (split_b/e) 3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1 (split_b) Summary of bytes read per ESP (2611279 = 100 percent): ESP 0 reads 652820 bytes ( 100 percent of avg) ESP 1 reads 652820 bytes ( 100 percent of avg) ESP 2 reads 652820 bytes ( 100 percent of avg) ESP 3 reads 652819 bytes ( 99 percent of avg) hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive modTSforDir_ = 1468710680, numOfPartCols_ = 0 Number of ranges to scan: 2 Number of esps to scan: 2 Esp# Range# StartOffset BytesRead FileName ====== ====== ============ ============ ============================== 0 0 0 1305640 t031hive/T031HIVE-0-20160716231120-1 (split_e) 1 1 1305640 1305639 t031hive/T031HIVE-0-20160716231120-1 (split_b) Summary of bytes read per ESP (2611279 = 100 percent): ESP 0 reads 1305640 bytes ( 100 percent of avg) ESP 1 reads 1305639 bytes ( 99 percent of avg) hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive modTSforDir_ = 1468710680, numOfPartCols_ = 0 The fix has three parts a) RangeTailIOSize will now default to maximum length of a row, if > 16KB. Each esp range will look ahead by this size. Previously this was hard coded to 16KB. b) If a whole buffer does not have the start of a row, that is not an error. c) We raise an error during compile if maximum row size is greater than size of IO buffer. We need to have the whole row (from start to finish) in a contiguous buffer to do the rest of our logic. Currently IO buffer defaults to 64 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)