[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

ASF GitHub Bot (JIRA) Wed, 03 Aug 2016 14:09:39 -0700

    [ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406615#comment-15406615
 ]


ASF GitHub Bot commented on TRAFODION-2138:
-------------------------------------------

Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/637#discussion_r73421305
  
    --- Diff: core/sql/generator/GenRelScan.cpp ---
    @@ -1182,6 +1181,15 @@ if (hTabStats->isOrcFile())
     
       UInt32 rangeTailIOSize = (UInt32)
           CmpCommon::getDefaultNumeric(HDFS_IO_RANGE_TAIL);
    +  if (rangeTailIOSize == 0) 
    +    {
    +      rangeTailIOSize = getTableDesc()->getNATable()->getRecordLength() +
    --- End diff --
    
    This record length is what the compiler thinks, the real length may be more 
or less. Hive makes it probably more complex, some fields like 
date/time/timestamp are strings in Hive and encoded binaries in Trafodion. 
Character sets may be different as well in some cases (e.g. GBK). I don't have 
a better solution than what the code does, but let's say we have this: create 
table t(a char(20000) not null, b timestamp(6) not null, c timestamp(6) not 
null). Trafodion will make the length 20022. In reality, a string may look like 
"20k chars..., 2015-01-01 00:00:00.000, 2016-01-01 00:00:00.000". That's 2049 
characters. Maybe the solution is to add the 16K to the record length, not take 
the max?


> Hive scan on wide tables can result in lost rows or error
> ---------------------------------------------------------
>
>                 Key: TRAFODION-2138
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2138
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>    Affects Versions: 2.0-incubating
>            Reporter: Suresh Subbiah
>            Assignee: Suresh Subbiah
>             Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '1000000';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(1000000), b 
> char(1000000));
> insert into t031t1 values (1, repeat('a', 1000000, 1000000) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 10000, 10000) , 'zzz');
> insert into t031t1 values (4, repeat('a', 100000, 100000) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 1000000, 1000000) , null);
> insert into t031t1 values (9, repeat('a', 500000, 500000) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '1000000';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(1000000), b 
> varchar(1000000)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '2000000' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '1000000' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 1000006:\n      esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 1111027:\n   
> 1111134:\n   
> 1111151:\n   
> 1111159:\n
> 2111165:\n      esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 1000006:\n      
> 1001013:\n   
> 1011020:\n
> 1111027:\n   
> 1111134:\n   
> 1111151:\n   
> 1111159:\n
> 2111165:\n      esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n      
> umber of esps to scan:    4
>   Esp# Range#  StartOffset    BytesRead      FileName
> ====== ====== ============ ============  ==============================
>      0      0            0       652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>      1      1       652820       652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>      2      2      1305640       652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>      3      3      1958460       652819  t031hive/T031HIVE-0-20160716231120-1 
> (split_b)
> Summary of bytes read per ESP (2611279 = 100 percent):
> ESP    0 reads             652820 bytes ( 100 percent of avg)
> ESP    1 reads             652820 bytes ( 100 percent of avg)
> ESP    2 reads             652820 bytes ( 100 percent of avg)
> ESP    3 reads             652819 bytes (  99 percent of avg)
> hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
> modTSforDir_ = 1468710680, numOfPartCols_ = 0
> Number of ranges to scan: 2
> Number of esps to scan:    2
>   Esp# Range#  StartOffset    BytesRead      FileName
> ====== ====== ============ ============  ==============================
>      0      0            0      1305640  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>      1      1      1305640      1305639  t031hive/T031HIVE-0-20160716231120-1 
> (split_b)
> Summary of bytes read per ESP (2611279 = 100 percent):
> ESP    0 reads            1305640 bytes ( 100 percent of avg)
> ESP    1 reads            1305639 bytes (  99 percent of avg)
> hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
> modTSforDir_ = 1468710680, numOfPartCols_ = 0
> The fix has three parts
> a)    RangeTailIOSize will now default to maximum length of a row, if > 16KB. 
> Each esp range will look ahead by this size. Previously this was hard coded 
> to 16KB.
> b)    If a whole buffer does not have the start of a row, that is not an 
> error.
> c)    We raise an error during compile if maximum row size is greater than 
> size of IO buffer. We need to have the whole row (from start to finish) in a 
> contiguous buffer to do the rest of our logic.  Currently IO buffer defaults 
> to 64 MB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

Reply via email to