[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

ASF GitHub Bot (JIRA) Wed, 03 Aug 2016 14:32:43 -0700

    [ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406653#comment-15406653
 ]


ASF GitHub Bot commented on TRAFODION-2138:
-------------------------------------------

Github user sureshsubbiah commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/637#discussion_r73425266
  
    --- Diff: core/sql/optimizer/BindRelExpr.cpp ---
    @@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA)
             }
         }
     
    +   if (naTable->isHiveTable() && 
    +       !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() 
||
    +    naTable->getClusteringIndex()->getHHDFSTableStats()
    +    ->isSequenceFile()) &&
    +       (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && 
    +       (naTable->getRecordLength() >
    +   CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024))
    +     {
    +       // do not raise error if buffersize is set though buffersize_bytes.
    --- End diff --
    
    HDFS_IO_BUFFERSIZE_BYTES was added initially in cases where we wanted to 
specify the buffer size in granularity less than a KB. The comment indicates 
that this is for testing. I was tempted to remove the cqd, but thought we may 
still need to test with precisely defined buffer sizes.
    
    I am viewing BUFFERSIZE_BYTES as some kind of internal switch, to be used 
by expert users to override the BUFFERSIZE setting. In case users have a wide 
table but for some reason want to avoid the new error they can use 
BUFFERSIZE_BYTES to skip the new error code. Wide tables can be handled if the 
actual rows are indeed small enough to fit in a buffer, the user may have other 
reasons for not wanting to changing defaults such as HIVE_MAX_STRING_LENGTH.
    
    If we prefer, I can also remove this BUFFERSIZE_BYTES backdoor.


> Hive scan on wide tables can result in lost rows or error
> ---------------------------------------------------------
>
>                 Key: TRAFODION-2138
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2138
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>    Affects Versions: 2.0-incubating
>            Reporter: Suresh Subbiah
>            Assignee: Suresh Subbiah
>             Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '1000000';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(1000000), b 
> char(1000000));
> insert into t031t1 values (1, repeat('a', 1000000, 1000000) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 10000, 10000) , 'zzz');
> insert into t031t1 values (4, repeat('a', 100000, 100000) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 1000000, 1000000) , null);
> insert into t031t1 values (9, repeat('a', 500000, 500000) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '1000000';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(1000000), b 
> varchar(1000000)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '2000000' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '1000000' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 1000006:\n      esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 1111027:\n   
> 1111134:\n   
> 1111151:\n   
> 1111159:\n
> 2111165:\n      esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 1000006:\n      
> 1001013:\n   
> 1011020:\n
> 1111027:\n   
> 1111134:\n   
> 1111151:\n   
> 1111159:\n
> 2111165:\n      esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n      
> umber of esps to scan:    4
>   Esp# Range#  StartOffset    BytesRead      FileName
> ====== ====== ============ ============  ==============================
>      0      0            0       652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>      1      1       652820       652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>      2      2      1305640       652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>      3      3      1958460       652819  t031hive/T031HIVE-0-20160716231120-1 
> (split_b)
> Summary of bytes read per ESP (2611279 = 100 percent):
> ESP    0 reads             652820 bytes ( 100 percent of avg)
> ESP    1 reads             652820 bytes ( 100 percent of avg)
> ESP    2 reads             652820 bytes ( 100 percent of avg)
> ESP    3 reads             652819 bytes (  99 percent of avg)
> hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
> modTSforDir_ = 1468710680, numOfPartCols_ = 0
> Number of ranges to scan: 2
> Number of esps to scan:    2
>   Esp# Range#  StartOffset    BytesRead      FileName
> ====== ====== ============ ============  ==============================
>      0      0            0      1305640  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>      1      1      1305640      1305639  t031hive/T031HIVE-0-20160716231120-1 
> (split_b)
> Summary of bytes read per ESP (2611279 = 100 percent):
> ESP    0 reads            1305640 bytes ( 100 percent of avg)
> ESP    1 reads            1305639 bytes (  99 percent of avg)
> hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
> modTSforDir_ = 1468710680, numOfPartCols_ = 0
> The fix has three parts
> a)    RangeTailIOSize will now default to maximum length of a row, if > 16KB. 
> Each esp range will look ahead by this size. Previously this was hard coded 
> to 16KB.
> b)    If a whole buffer does not have the start of a row, that is not an 
> error.
> c)    We raise an error during compile if maximum row size is greater than 
> size of IO buffer. We need to have the whole row (from start to finish) in a 
> contiguous buffer to do the rest of our logic.  Currently IO buffer defaults 
> to 64 MB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

Reply via email to