Suresh Subbiah created TRAFODION-2138:
-----------------------------------------

             Summary: Hive scan on wide tables can result in lost rows or error
                 Key: TRAFODION-2138
                 URL: https://issues.apache.org/jira/browse/TRAFODION-2138
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-exe
    Affects Versions: 2.0-incubating
            Reporter: Suresh Subbiah
            Assignee: Suresh Subbiah
             Fix For: 2.0-incubating


When a Hive table has wide rows (say greater than 16KB), and the scan is 
parallelized. It is possible that one one these two problem situations may arise

a) An error with sqlcode 8446 is raised
*** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
record delimiter found in buffer from hdfsRead.

b) some rows are lost.

Both these situations occur since the scan is broken up into ranges and the 
lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which defaults 
to 16KB. This setting is also hardcoded in a crucial location so changing the 
default does not help.

Test case to see this problem is

cqd traf_max_character_col_length '1000000';
drop table if exists t031t1;
create table t031t1 (z int not null primary key, a varchar(1000000), b 
char(1000000));
insert into t031t1 values (1, repeat('a', 1000000, 1000000) , 'def');
insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
insert into t031t1 values (3, repeat('a', 10000, 10000) , 'zzz');
insert into t031t1 values (4, repeat('a', 100000, 100000) , 'zzz');
insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
insert into t031t1 values (8, repeat('a', 1000000, 1000000) , null);
insert into t031t1 values (9, repeat('a', 500000, 500000) , null);
insert into t031t1 values (10, repeat('a', 100, 100) , null);


--in hive
create table t031hive(z int, a string, b string);

-- in sqlci
cqd hive_max_string_length '1000000';
insert into hive.hive.t031hive select * from t031t1;
create external table t031hive (z int, a varchar(1000000), b varchar(1000000)) 
for hive.hive.t031hive;

control query shape sort_groupby(exchange(cut));
-- 2 esp plan
cqd hive_min_bytes_per_esp_partition '2000000' ;
prepare s2 from select count(*) from hive.hive.t031hive;
explain options 'f' s2 ;
execute s2 ;

cqd hive_min_bytes_per_esp_partition '1000000' ;
-- 4 esp plan
prepare s2 from select count(*) from hive.hive.t031hive;
explain options 'f' s2 ;
execute s2 ;


The row boundaries and esp scan rage boundaries are shown below

hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"

-- 4 esp plan
1000006:\n      esp0 end (652820)
1001013:\n   
1011020:\n
1111027:\n   
1111134:\n   
1111151:\n   
1111159:\n
2111165:\n      esp1 end (1305640)  esp2 end (1958460)
2611171:\n   
2611278:\n   

-- 2 esp plan
1000006:\n      
1001013:\n   
1011020:\n
1111027:\n   
1111134:\n   
1111151:\n   
1111159:\n
2111165:\n      esp0 end (1305640)  
2611171:\n   
2611278:\n      

umber of esps to scan:    4

  Esp# Range#  StartOffset    BytesRead      FileName
====== ====== ============ ============  ==============================

     0      0            0       652820  t031hive/T031HIVE-0-20160716231120-1 
(split_e)
     1      1       652820       652820  t031hive/T031HIVE-0-20160716231120-1 
(split_b/e)
     2      2      1305640       652820  t031hive/T031HIVE-0-20160716231120-1 
(split_b/e)
     3      3      1958460       652819  t031hive/T031HIVE-0-20160716231120-1 
(split_b)

Summary of bytes read per ESP (2611279 = 100 percent):

ESP    0 reads             652820 bytes ( 100 percent of avg)
ESP    1 reads             652820 bytes ( 100 percent of avg)
ESP    2 reads             652820 bytes ( 100 percent of avg)
ESP    3 reads             652819 bytes (  99 percent of avg)
hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
modTSforDir_ = 1468710680, numOfPartCols_ = 0

Number of ranges to scan: 2
Number of esps to scan:    2

  Esp# Range#  StartOffset    BytesRead      FileName
====== ====== ============ ============  ==============================

     0      0            0      1305640  t031hive/T031HIVE-0-20160716231120-1 
(split_e)
     1      1      1305640      1305639  t031hive/T031HIVE-0-20160716231120-1 
(split_b)

Summary of bytes read per ESP (2611279 = 100 percent):

ESP    0 reads            1305640 bytes ( 100 percent of avg)
ESP    1 reads            1305639 bytes (  99 percent of avg)
hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
modTSforDir_ = 1468710680, numOfPartCols_ = 0


The fix has three parts

a)      RangeTailIOSize will now default to maximum length of a row, if > 16KB. 
Each esp range will look ahead by this size. Previously this was hard coded to 
16KB.
b)      If a whole buffer does not have the start of a row, that is not an 
error.
c)      We raise an error during compile if maximum row size is greater than 
size of IO buffer. We need to have the whole row (from start to finish) in a 
contiguous buffer to do the rest of our logic.  Currently IO buffer defaults to 
64 MB. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to