[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406653#comment-15406653 ]
ASF GitHub Bot commented on TRAFODION-2138: ------------------------------------------- Github user sureshsubbiah commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73425266 --- Diff: core/sql/optimizer/BindRelExpr.cpp --- @@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA) } } + if (naTable->isHiveTable() && + !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() || + naTable->getClusteringIndex()->getHHDFSTableStats() + ->isSequenceFile()) && + (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && + (naTable->getRecordLength() > + CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024)) + { + // do not raise error if buffersize is set though buffersize_bytes. --- End diff -- HDFS_IO_BUFFERSIZE_BYTES was added initially in cases where we wanted to specify the buffer size in granularity less than a KB. The comment indicates that this is for testing. I was tempted to remove the cqd, but thought we may still need to test with precisely defined buffer sizes. I am viewing BUFFERSIZE_BYTES as some kind of internal switch, to be used by expert users to override the BUFFERSIZE setting. In case users have a wide table but for some reason want to avoid the new error they can use BUFFERSIZE_BYTES to skip the new error code. Wide tables can be handled if the actual rows are indeed small enough to fit in a buffer, the user may have other reasons for not wanting to changing defaults such as HIVE_MAX_STRING_LENGTH. If we prefer, I can also remove this BUFFERSIZE_BYTES backdoor. > Hive scan on wide tables can result in lost rows or error > --------------------------------------------------------- > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe > Affects Versions: 2.0-incubating > Reporter: Suresh Subbiah > Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '1000000'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(1000000), b > char(1000000)); > insert into t031t1 values (1, repeat('a', 1000000, 1000000) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 10000, 10000) , 'zzz'); > insert into t031t1 values (4, repeat('a', 100000, 100000) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 1000000, 1000000) , null); > insert into t031t1 values (9, repeat('a', 500000, 500000) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '1000000'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(1000000), b > varchar(1000000)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '2000000' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '1000000' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 1000006:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 1111027:\n > 1111134:\n > 1111151:\n > 1111159:\n > 2111165:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 1000006:\n > 1001013:\n > 1011020:\n > 1111027:\n > 1111134:\n > 1111151:\n > 1111159:\n > 2111165:\n esp0 end (1305640) > 2611171:\n > 2611278:\n > umber of esps to scan: 4 > Esp# Range# StartOffset BytesRead FileName > ====== ====== ============ ============ ============================== > 0 0 0 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1 > (split_b) > Summary of bytes read per ESP (2611279 = 100 percent): > ESP 0 reads 652820 bytes ( 100 percent of avg) > ESP 1 reads 652820 bytes ( 100 percent of avg) > ESP 2 reads 652820 bytes ( 100 percent of avg) > ESP 3 reads 652819 bytes ( 99 percent of avg) > hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive > modTSforDir_ = 1468710680, numOfPartCols_ = 0 > Number of ranges to scan: 2 > Number of esps to scan: 2 > Esp# Range# StartOffset BytesRead FileName > ====== ====== ============ ============ ============================== > 0 0 0 1305640 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 1305640 1305639 t031hive/T031HIVE-0-20160716231120-1 > (split_b) > Summary of bytes read per ESP (2611279 = 100 percent): > ESP 0 reads 1305640 bytes ( 100 percent of avg) > ESP 1 reads 1305639 bytes ( 99 percent of avg) > hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive > modTSforDir_ = 1468710680, numOfPartCols_ = 0 > The fix has three parts > a) RangeTailIOSize will now default to maximum length of a row, if > 16KB. > Each esp range will look ahead by this size. Previously this was hard coded > to 16KB. > b) If a whole buffer does not have the start of a row, that is not an > error. > c) We raise an error during compile if maximum row size is greater than > size of IO buffer. We need to have the whole row (from start to finish) in a > contiguous buffer to do the rest of our logic. Currently IO buffer defaults > to 64 MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)