[jira] [Resolved] (TRAFODION-2090) improve the very first time of query compiling
[ https://issues.apache.org/jira/browse/TRAFODION-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liu ming resolved TRAFODION-2090. - Resolution: Duplicate Fix Version/s: 2.1-incubating duplicated TRAFODION-2137 > improve the very first time of query compiling > -- > > Key: TRAFODION-2090 > URL: https://issues.apache.org/jira/browse/TRAFODION-2090 > Project: Apache Trafodion > Issue Type: Improvement >Reporter: liu ming >Assignee: liu ming > Fix For: 2.1-incubating > > > Currently, the very first query run in master executor: sqlci or mxosrvr will > take seconds to compile. > Initial analysis shows most time spend on metadata retrieving. > Need to improve the compile time for the very first time, at least from end > user's perspective. > Speed up the metadata retrieval speed. > Redesign metadata scheme > Prefetch metadata > are all among possible solutions, the aim is to shorten the time spend on the > SQL compile for the first time. Initial goal is within 1 second, faster the > better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (TRAFODION-2090) improve the very first time of query compiling
[ https://issues.apache.org/jira/browse/TRAFODION-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TRAFODION-2090 stopped by liu ming. --- > improve the very first time of query compiling > -- > > Key: TRAFODION-2090 > URL: https://issues.apache.org/jira/browse/TRAFODION-2090 > Project: Apache Trafodion > Issue Type: Improvement >Reporter: liu ming >Assignee: liu ming > Fix For: 2.1-incubating > > > Currently, the very first query run in master executor: sqlci or mxosrvr will > take seconds to compile. > Initial analysis shows most time spend on metadata retrieving. > Need to improve the compile time for the very first time, at least from end > user's perspective. > Speed up the metadata retrieval speed. > Redesign metadata scheme > Prefetch metadata > are all among possible solutions, the aim is to shorten the time spend on the > SQL compile for the first time. Initial goal is within 1 second, faster the > better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-2139) Possible invalid UTF-8 string in TEXT metadata table
[ https://issues.apache.org/jira/browse/TRAFODION-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407058#comment-15407058 ] Hans Zeller commented on TRAFODION-2139: Here is a test case (shows the issue in the code before Anoop's recent change for TRAFODION-2137. Probably reproduces it also with that change. If not, try adding one character somewhere in the beginning. drop view t10k_utf8; create view t10k_utf8 as select * from (values ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'), ('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
[jira] [Created] (TRAFODION-2140) Port IUS from EsgynDB to Trafodion open source
Barry Fritchman created TRAFODION-2140: -- Summary: Port IUS from EsgynDB to Trafodion open source Key: TRAFODION-2140 URL: https://issues.apache.org/jira/browse/TRAFODION-2140 Project: Apache Trafodion Issue Type: New Feature Reporter: Barry Fritchman Assignee: Barry Fritchman It has been decided that the Incremental Update Statistics feature should be open source; this JIRA is for porting it to the incubator-trafodion code base from EsgynDB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-2079) [3rd party tool-Squirrel] indexes won't show up on objects panel, sql command line works fine
[ https://issues.apache.org/jira/browse/TRAFODION-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406799#comment-15406799 ] ASF GitHub Bot commented on TRAFODION-2079: --- Github user xwq commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/594#discussion_r73438864 --- Diff: dcs/src/test/jdbc_test/src/test/java/org/trafodion/jdbc_test/TestGetIndexInfo.java --- @@ -0,0 +1,180 @@ +// @@@ START COPYRIGHT @@@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +// +// @@@ END COPYRIGHT @@@ + +import java.sql.Connection; +import java.sql.DatabaseMetaData; +import java.sql.DriverManager; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.Statement; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; +import static org.junit.Assert.*; + +public class TestGetIndexInfo { + + private static final String INDEX_INFO_TEST_TABLE = "INDEX_INFO_TEST_TABLE"; + + private static final String strCreateTableQuery = "CREATE TABLE " + INDEX_INFO_TEST_TABLE + "(C1 INT, C2 INT)"; + private static final String strInsertQuery = "INSERT INTO " + INDEX_INFO_TEST_TABLE + " (C1, C2) VALUES (?, ?)"; + private static final String strUpdateStatisticsQuery = "UPDATE STATISTICS FOR TABLE " + INDEX_INFO_TEST_TABLE + " ON (C1, C2)"; + private static final String strDropTableQuery = "DROP TABLE " + INDEX_INFO_TEST_TABLE; + private static final String INDEX_C1_NAME = INDEX_INFO_TEST_TABLE + "_INDEX"; + + private static final String strCreateIndexQuery = "CREATE INDEX " + INDEX_C1_NAME +" on " + INDEX_INFO_TEST_TABLE + "(C1)"; + + private static Connection _conn = null; + + @BeforeClass +public static void doTestSuiteSetup() throws Exception { + try { + _conn = DriverManager.getConnection(Utils.url, Utils.usr, Utils.pwd); + Statement stmt = _conn.createStatement(); + stmt.execute(strCreateTableQuery); + + PreparedStatement pstmt = _conn.prepareStatement(strInsertQuery); + int[][] testValues = { + {1, 2}, + {10, 3}, + {2, 2} + }; + + for (int i = 0; i < testValues.length; i++) { + pstmt.setInt(1, testValues[i][0]); + pstmt.setInt(2, testValues[i][1]); + pstmt.addBatch(); + } + pstmt.executeBatch(); + pstmt.close(); + + // create index + stmt.execute(strCreateIndexQuery); + + // update statistics on the table + stmt.execute(strUpdateStatisticsQuery); + + stmt.close(); + } + catch (Exception e) { + System.out.println(e.getMessage()); + e.printStackTrace(); + } finally { + } +} + + @Test +public void testGetNoneUniqueIndexInfo() { + IndexInfo[] expIndexInfo = { + new IndexInfo("TRAFODION", "SEABASE", INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, "C1", 0, 3, (short)0, (String)null), + new IndexInfo("TRAFODION", "SEABASE", INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, "C2", 0, 3, (short)0, (String)null), + new IndexInfo("TRAFODION", "SEABASE", INDEX_INFO_TEST_TABLE, false, (String)null, INDEX_C1_NAME, (short)3, (short)0, "", 0, 0,
[jira] [Assigned] (TRAFODION-2139) Possible invalid UTF-8 string in TEXT metadata table
[ https://issues.apache.org/jira/browse/TRAFODION-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Zeller reassigned TRAFODION-2139: -- Assignee: Hans Zeller > Possible invalid UTF-8 string in TEXT metadata table > > > Key: TRAFODION-2139 > URL: https://issues.apache.org/jira/browse/TRAFODION-2139 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmu >Affects Versions: 1.3-incubating >Reporter: Hans Zeller >Assignee: Hans Zeller > > Method CmpSeabaseDDL::updateTextTable() takes a string, splits it into chunks > of a maximum size and inserts them into the TEXT table. I have not yet > verified that this problem actually can occur, but I think there is a > possibility that the method will split a multi-byte UTF-8 character in the > middle and will get an error. To fix this, we should use method > findStartOfChar() in file sql/common/csconvert.h to find a good point to > split the text. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2139) Possible invalid UTF-8 string in TEXT metadata table
Hans Zeller created TRAFODION-2139: -- Summary: Possible invalid UTF-8 string in TEXT metadata table Key: TRAFODION-2139 URL: https://issues.apache.org/jira/browse/TRAFODION-2139 Project: Apache Trafodion Issue Type: Bug Components: sql-cmu Affects Versions: 1.3-incubating Reporter: Hans Zeller Method CmpSeabaseDDL::updateTextTable() takes a string, splits it into chunks of a maximum size and inserts them into the TEXT table. I have not yet verified that this problem actually can occur, but I think there is a possibility that the method will split a multi-byte UTF-8 character in the middle and will get an error. To fix this, we should use method findStartOfChar() in file sql/common/csconvert.h to find a good point to split the text. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406696#comment-15406696 ] ASF GitHub Bot commented on TRAFODION-2138: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73429919 --- Diff: core/sql/optimizer/BindRelExpr.cpp --- @@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA) } } + if (naTable->isHiveTable() && + !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() || +naTable->getClusteringIndex()->getHHDFSTableStats() +->isSequenceFile()) && + (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && + (naTable->getRecordLength() > + CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024)) + { + // do not raise error if buffersize is set though buffersize_bytes. --- End diff -- Another thing you can do is to make HDFS_IO_BUFFERSIZE a FLOAT, then you can specify sizes of less than one KB as well. That would be less confusing. I'm not sure I understand how you wanted to use this as a backdoor and don't have a strong opinion, you will know better how you want to use it. > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 106:\n > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp0 end (1305640) > 2611171:\n > 2611278:\n > umber of esps to scan:4 > Esp# Range# StartOffsetBytesRead FileName > == == == > 0 00 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 2 2 1305640 652820
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406654#comment-15406654 ] ASF GitHub Bot commented on TRAFODION-2138: --- Github user sureshsubbiah commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73425322 --- Diff: core/sql/bin/SqlciErrors.txt --- @@ -1282,6 +1282,7 @@ $1~String1 4223 42000 9 BEGINNER MAJOR DBADMIN $0~String0 not supported in this software version. 4224 42000 9 BEGINNER MAJOR DBADMIN The directory $0~String0 is not a valid Hive location. 4225 42000 9 BEGINNER MAJOR DBADMIN Number of column families cannot exceed 32. +4226 42000 9 BEGINNER MAJOR DBADMIN Table $0~TableName has a maximum record length of $0~Int0 which is greater than HDFS_IO_BUFFERSIZE. Increase buffer size setting, or reduce HIVE_MAX_STRING_LENGTH. --- End diff -- Done. Thanks for the suggestion. > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 106:\n > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp0 end (1305640) > 2611171:\n > 2611278:\n > umber of esps to scan:4 > Esp# Range# StartOffsetBytesRead FileName > == == == > 0 00 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1 > (split_b) > Summary of bytes read per ESP (2611279 = 100 percent): > ESP0 reads 652820 bytes ( 100 percent of avg) > ESP1 reads 652820 bytes (
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406653#comment-15406653 ] ASF GitHub Bot commented on TRAFODION-2138: --- Github user sureshsubbiah commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73425266 --- Diff: core/sql/optimizer/BindRelExpr.cpp --- @@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA) } } + if (naTable->isHiveTable() && + !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() || +naTable->getClusteringIndex()->getHHDFSTableStats() +->isSequenceFile()) && + (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && + (naTable->getRecordLength() > + CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024)) + { + // do not raise error if buffersize is set though buffersize_bytes. --- End diff -- HDFS_IO_BUFFERSIZE_BYTES was added initially in cases where we wanted to specify the buffer size in granularity less than a KB. The comment indicates that this is for testing. I was tempted to remove the cqd, but thought we may still need to test with precisely defined buffer sizes. I am viewing BUFFERSIZE_BYTES as some kind of internal switch, to be used by expert users to override the BUFFERSIZE setting. In case users have a wide table but for some reason want to avoid the new error they can use BUFFERSIZE_BYTES to skip the new error code. Wide tables can be handled if the actual rows are indeed small enough to fit in a buffer, the user may have other reasons for not wanting to changing defaults such as HIVE_MAX_STRING_LENGTH. If we prefer, I can also remove this BUFFERSIZE_BYTES backdoor. > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 106:\n > 1001013:\n > 1011020:\n > 027:\n
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406626#comment-15406626 ] ASF GitHub Bot commented on TRAFODION-2138: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73422384 --- Diff: core/sql/bin/SqlciErrors.txt --- @@ -1282,6 +1282,7 @@ $1~String1 4223 42000 9 BEGINNER MAJOR DBADMIN $0~String0 not supported in this software version. 4224 42000 9 BEGINNER MAJOR DBADMIN The directory $0~String0 is not a valid Hive location. 4225 42000 9 BEGINNER MAJOR DBADMIN Number of column families cannot exceed 32. +4226 42000 9 BEGINNER MAJOR DBADMIN Table $0~TableName has a maximum record length of $0~Int0 which is greater than HDFS_IO_BUFFERSIZE. Increase buffer size setting, or reduce HIVE_MAX_STRING_LENGTH. --- End diff -- Maybe change HDFS_IO_BUFFERSIZE to "the HDFS_IO_BUFFERSIZE default"? People might not know otherwise how to change it. > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 106:\n > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp0 end (1305640) > 2611171:\n > 2611278:\n > umber of esps to scan:4 > Esp# Range# StartOffsetBytesRead FileName > == == == > 0 00 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1 > (split_b) > Summary of bytes read per ESP (2611279 = 100 percent): > ESP0 reads
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406619#comment-15406619 ] ASF GitHub Bot commented on TRAFODION-2138: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73421938 --- Diff: core/sql/optimizer/BindRelExpr.cpp --- @@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA) } } + if (naTable->isHiveTable() && + !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() || +naTable->getClusteringIndex()->getHHDFSTableStats() +->isSequenceFile()) && + (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && + (naTable->getRecordLength() > + CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024)) + { + // do not raise error if buffersize is set though buffersize_bytes. --- End diff -- What if the user users HDFS_IO_BUFFERSIZE_BYTES and that size is too small? Sorry, I don't understand the meaning of the 2 CQDs, are they for the same thing and just use different units? > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 106:\n > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp0 end (1305640) > 2611171:\n > 2611278:\n > umber of esps to scan:4 > Esp# Range# StartOffsetBytesRead FileName > == == == > 0 00 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_b/e) > 3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1 >
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406615#comment-15406615 ] ASF GitHub Bot commented on TRAFODION-2138: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/637#discussion_r73421305 --- Diff: core/sql/generator/GenRelScan.cpp --- @@ -1182,6 +1181,15 @@ if (hTabStats->isOrcFile()) UInt32 rangeTailIOSize = (UInt32) CmpCommon::getDefaultNumeric(HDFS_IO_RANGE_TAIL); + if (rangeTailIOSize == 0) +{ + rangeTailIOSize = getTableDesc()->getNATable()->getRecordLength() + --- End diff -- This record length is what the compiler thinks, the real length may be more or less. Hive makes it probably more complex, some fields like date/time/timestamp are strings in Hive and encoded binaries in Trafodion. Character sets may be different as well in some cases (e.g. GBK). I don't have a better solution than what the code does, but let's say we have this: create table t(a char(2) not null, b timestamp(6) not null, c timestamp(6) not null). Trafodion will make the length 20022. In reality, a string may look like "20k chars..., 2015-01-01 00:00:00.000, 2016-01-01 00:00:00.000". That's 2049 characters. Maybe the solution is to add the 16K to the record length, not take the max? > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp1 end (1305640) esp2 end (1958460) > 2611171:\n > 2611278:\n > -- 2 esp plan > 106:\n > 1001013:\n > 1011020:\n > 027:\n > 134:\n > 151:\n > 159:\n > 265:\n esp0 end (1305640) > 2611171:\n > 2611278:\n > umber of esps to scan:4 > Esp# Range# StartOffsetBytesRead FileName > == == == > 0 00 652820 t031hive/T031HIVE-0-20160716231120-1 > (split_e) > 1 1 652820 652820
[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
[ https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406581#comment-15406581 ] ASF GitHub Bot commented on TRAFODION-2138: --- GitHub user sureshsubbiah opened a pull request: https://github.com/apache/incubator-trafodion/pull/637 [TRAFODION-2138] Hive scan on wide tables can result in lost rows or error The fix has three parts a) RangeTailIOSize will now default to maximum length of a row, if > 16KB. Each esp range will look ahead by this size. Previously this was hard coded to 16KB. b) If a whole buffer does not have the start of a row, that is not an error. c) We raise an error during compile if maximum row size is greater than size of IO buffer. We need to have the whole row (from start to finish) in a contiguous buffer to do the rest of our logic. Currently IO buffer defaults to 64 MB. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sureshsubbiah/incubator-trafodion traf-2094 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-trafodion/pull/637.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #637 commit 6d266221b959c14319a760716939f261a2460276 Author: Suresh SubbiahDate: 2016-08-03T20:35:43Z [TRAFODION-2138] Hive scan on wide tables can result in lost rows or error The fix has three parts a) RangeTailIOSize will now default to maximum length of a row, if > 16KB. Each esp range will look ahead by this size. Previously this was hard coded to 16KB. b) If a whole buffer does not have the start of a row, that is not an error. c) We raise an error during compile if maximum row size is greater than size of IO buffer. We need to have the whole row (from start to finish) in a contiguous buffer to do the rest of our logic. Currently IO buffer defaults to 64 MB. > Hive scan on wide tables can result in lost rows or error > - > > Key: TRAFODION-2138 > URL: https://issues.apache.org/jira/browse/TRAFODION-2138 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Suresh Subbiah >Assignee: Suresh Subbiah > Fix For: 2.0-incubating > > > When a Hive table has wide rows (say greater than 16KB), and the scan is > parallelized. It is possible that one one these two problem situations may > arise > a) An error with sqlcode 8446 is raised > *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No > record delimiter found in buffer from hdfsRead. > b) some rows are lost. > Both these situations occur since the scan is broken up into ranges and the > lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which > defaults to 16KB. This setting is also hardcoded in a crucial location so > changing the default does not help. > Test case to see this problem is > cqd traf_max_character_col_length '100'; > drop table if exists t031t1; > create table t031t1 (z int not null primary key, a varchar(100), b > char(100)); > insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); > insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); > insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); > insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); > insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); > insert into t031t1 values (8, repeat('a', 100, 100) , null); > insert into t031t1 values (9, repeat('a', 50, 50) , null); > insert into t031t1 values (10, repeat('a', 100, 100) , null); > --in hive > create table t031hive(z int, a string, b string); > -- in sqlci > cqd hive_max_string_length '100'; > insert into hive.hive.t031hive select * from t031t1; > create external table t031hive (z int, a varchar(100), b > varchar(100)) for hive.hive.t031hive; > control query shape sort_groupby(exchange(cut)); > -- 2 esp plan > cqd hive_min_bytes_per_esp_partition '200' ; > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > cqd hive_min_bytes_per_esp_partition '100' ; > -- 4 esp plan > prepare s2 from select count(*) from hive.hive.t031hive; > explain options 'f' s2 ; > execute s2 ; > The row boundaries and esp scan rage boundaries are shown below > hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" > -- 4 esp plan > 106:\n esp0 end (652820) >
[jira] [Created] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error
Suresh Subbiah created TRAFODION-2138: - Summary: Hive scan on wide tables can result in lost rows or error Key: TRAFODION-2138 URL: https://issues.apache.org/jira/browse/TRAFODION-2138 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 2.0-incubating Reporter: Suresh Subbiah Assignee: Suresh Subbiah Fix For: 2.0-incubating When a Hive table has wide rows (say greater than 16KB), and the scan is parallelized. It is possible that one one these two problem situations may arise a) An error with sqlcode 8446 is raised *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No record delimiter found in buffer from hdfsRead. b) some rows are lost. Both these situations occur since the scan is broken up into ranges and the lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which defaults to 16KB. This setting is also hardcoded in a crucial location so changing the default does not help. Test case to see this problem is cqd traf_max_character_col_length '100'; drop table if exists t031t1; create table t031t1 (z int not null primary key, a varchar(100), b char(100)); insert into t031t1 values (1, repeat('a', 100, 100) , 'def'); insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz'); insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz'); insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz'); insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz'); insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz'); insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz'); insert into t031t1 values (8, repeat('a', 100, 100) , null); insert into t031t1 values (9, repeat('a', 50, 50) , null); insert into t031t1 values (10, repeat('a', 100, 100) , null); --in hive create table t031hive(z int, a string, b string); -- in sqlci cqd hive_max_string_length '100'; insert into hive.hive.t031hive select * from t031t1; create external table t031hive (z int, a varchar(100), b varchar(100)) for hive.hive.t031hive; control query shape sort_groupby(exchange(cut)); -- 2 esp plan cqd hive_min_bytes_per_esp_partition '200' ; prepare s2 from select count(*) from hive.hive.t031hive; explain options 'f' s2 ; execute s2 ; cqd hive_min_bytes_per_esp_partition '100' ; -- 4 esp plan prepare s2 from select count(*) from hive.hive.t031hive; explain options 'f' s2 ; execute s2 ; The row boundaries and esp scan rage boundaries are shown below hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n" -- 4 esp plan 106:\n esp0 end (652820) 1001013:\n 1011020:\n 027:\n 134:\n 151:\n 159:\n 265:\n esp1 end (1305640) esp2 end (1958460) 2611171:\n 2611278:\n -- 2 esp plan 106:\n 1001013:\n 1011020:\n 027:\n 134:\n 151:\n 159:\n 265:\n esp0 end (1305640) 2611171:\n 2611278:\n umber of esps to scan:4 Esp# Range# StartOffsetBytesRead FileName == == == 0 00 652820 t031hive/T031HIVE-0-20160716231120-1 (split_e) 1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1 (split_b/e) 2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1 (split_b/e) 3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1 (split_b) Summary of bytes read per ESP (2611279 = 100 percent): ESP0 reads 652820 bytes ( 100 percent of avg) ESP1 reads 652820 bytes ( 100 percent of avg) ESP2 reads 652820 bytes ( 100 percent of avg) ESP3 reads 652819 bytes ( 99 percent of avg) hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive modTSforDir_ = 1468710680, numOfPartCols_ = 0 Number of ranges to scan: 2 Number of esps to scan:2 Esp# Range# StartOffsetBytesRead FileName == == == 0 00 1305640 t031hive/T031HIVE-0-20160716231120-1 (split_e) 1 1 1305640 1305639 t031hive/T031HIVE-0-20160716231120-1 (split_b) Summary of bytes read per ESP (2611279 = 100 percent): ESP0 reads1305640 bytes ( 100 percent of avg) ESP1 reads1305639 bytes ( 99 percent of avg) hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive modTSforDir_ = 1468710680, numOfPartCols_ = 0 The fix has three parts a) RangeTailIOSize will now default to maximum length of a row, if > 16KB. Each esp range will look ahead by this size. Previously this was hard coded to 16KB. b) If a whole buffer does not have the start of a row, that is not an error. c) We
[jira] [Commented] (TRAFODION-2119) create table report strange WARNING when using store by and allow_nullable_unique_key_constraint CQD
[ https://issues.apache.org/jira/browse/TRAFODION-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406200#comment-15406200 ] ASF GitHub Bot commented on TRAFODION-2119: --- Github user traflm closed the pull request at: https://github.com/apache/incubator-trafodion/pull/633 > create table report strange WARNING when using store by and > allow_nullable_unique_key_constraint CQD > > > Key: TRAFODION-2119 > URL: https://issues.apache.org/jira/browse/TRAFODION-2119 > Project: Apache Trafodion > Issue Type: Bug >Reporter: liu ming >Assignee: liu ming > > cqd allow_nullable_unique_key_constraint 'on' ; > create table T_RH_DZSP > ( > s_orgcode CHAR(12) character set utf8, > s_applydate CHAR(8) character set utf8, > s_packnovarchar(8) character set utf8, > s_trecode CHAR(10) character set utf8, > s_trename varchar(60) character set utf8, > s_taxorgcodevarchar(12) character set utf8, > s_paybnkno varchar(12) character set utf8, > s_trano varchar(8) character set utf8, > s_orimsgno CHAR(4) character set utf8, > f_traamtnumeric(18,2), > s_payeropnbnkno varchar(12) character set utf8, > payeropbkname varchar(60) character set utf8, > s_handorgname varchar(200) character set utf8, > s_payacct varchar(32) character set utf8, > s_taxvouno varchar(20) character set utf8, > s_billdate CHAR(8) character set utf8, > s_taxpaycodevarchar(20) character set utf8, > s_taxpaynamevarchar(200) character set utf8, > c_budgettypeCHAR(1) character set utf8, > c_trimflag CHAR(1) character set utf8, > s_etpcode varchar(20) character set utf8, > s_etpname varchar(200) character set utf8, > s_etptype varchar(20) character set utf8, > s_bdgsbtcodevarchar(30) character set utf8, > s_bdgsbtnamevarchar(60) character set utf8, > s_limit CHAR(8) character set utf8, > s_taxtypecode CHAR(10) character set utf8, > s_taxkindname varchar(60) character set utf8, > c_bdglevel CHAR(1) character set utf8, > c_bdglevelname varchar(60) character set utf8, > s_taxstartdate CHAR(8) character set utf8, > s_taxenddateCHAR(8) character set utf8, > s_astflag varchar(35) character set utf8, > c_taxtype CHAR(1) character set utf8, > s_acct CHAR(8) character set utf8, > s_trastate CHAR(2) character set utf8, > s_remarkvarchar(60) character set utf8, > s_remark1 varchar(60) character set utf8, > s_remark2 varchar(60) character set utf8, > s_inputerid varchar(18) character set utf8, > ts_sysupdatevarchar(100) character set utf8, > s_seq varchar(100) character set utf8 > ) > store by (S_APPLYDATE) > salt using 48 partitions on (S_APPLYDATE) > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > -- COMPRESSION = 'SNAPPY', > MEMSTORE_FLUSH_SIZE = '1073741824' > ); > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of a > character expression. > *** WARNING[8402] A string overflow occurred during the evaluation of
[jira] [Commented] (TRAFODION-2079) [3rd party tool-Squirrel] indexes won't show up on objects panel, sql command line works fine
[ https://issues.apache.org/jira/browse/TRAFODION-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406073#comment-15406073 ] ASF GitHub Bot commented on TRAFODION-2079: --- Github user selvaganesang commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/594#discussion_r73360698 --- Diff: dcs/src/test/jdbc_test/src/test/java/org/trafodion/jdbc_test/TestGetIndexInfo.java --- @@ -0,0 +1,180 @@ +// @@@ START COPYRIGHT @@@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +// +// @@@ END COPYRIGHT @@@ + +import java.sql.Connection; +import java.sql.DatabaseMetaData; +import java.sql.DriverManager; +import java.sql.PreparedStatement; +import java.sql.ResultSet; +import java.sql.Statement; + +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Test; +import static org.junit.Assert.*; + +public class TestGetIndexInfo { + + private static final String INDEX_INFO_TEST_TABLE = "INDEX_INFO_TEST_TABLE"; + + private static final String strCreateTableQuery = "CREATE TABLE " + INDEX_INFO_TEST_TABLE + "(C1 INT, C2 INT)"; + private static final String strInsertQuery = "INSERT INTO " + INDEX_INFO_TEST_TABLE + " (C1, C2) VALUES (?, ?)"; + private static final String strUpdateStatisticsQuery = "UPDATE STATISTICS FOR TABLE " + INDEX_INFO_TEST_TABLE + " ON (C1, C2)"; + private static final String strDropTableQuery = "DROP TABLE " + INDEX_INFO_TEST_TABLE; + private static final String INDEX_C1_NAME = INDEX_INFO_TEST_TABLE + "_INDEX"; + + private static final String strCreateIndexQuery = "CREATE INDEX " + INDEX_C1_NAME +" on " + INDEX_INFO_TEST_TABLE + "(C1)"; + + private static Connection _conn = null; + + @BeforeClass +public static void doTestSuiteSetup() throws Exception { + try { + _conn = DriverManager.getConnection(Utils.url, Utils.usr, Utils.pwd); + Statement stmt = _conn.createStatement(); + stmt.execute(strCreateTableQuery); + + PreparedStatement pstmt = _conn.prepareStatement(strInsertQuery); + int[][] testValues = { + {1, 2}, + {10, 3}, + {2, 2} + }; + + for (int i = 0; i < testValues.length; i++) { + pstmt.setInt(1, testValues[i][0]); + pstmt.setInt(2, testValues[i][1]); + pstmt.addBatch(); + } + pstmt.executeBatch(); + pstmt.close(); + + // create index + stmt.execute(strCreateIndexQuery); + + // update statistics on the table + stmt.execute(strUpdateStatisticsQuery); + + stmt.close(); + } + catch (Exception e) { + System.out.println(e.getMessage()); + e.printStackTrace(); + } finally { + } +} + + @Test +public void testGetNoneUniqueIndexInfo() { + IndexInfo[] expIndexInfo = { + new IndexInfo("TRAFODION", "SEABASE", INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, "C1", 0, 3, (short)0, (String)null), + new IndexInfo("TRAFODION", "SEABASE", INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, "C2", 0, 3, (short)0, (String)null), + new IndexInfo("TRAFODION", "SEABASE", INDEX_INFO_TEST_TABLE, false, (String)null, INDEX_C1_NAME, (short)3, (short)0, "",
[jira] [Work started] (TRAFODION-2117) add support of SQL extension 'EXCEPT'
[ https://issues.apache.org/jira/browse/TRAFODION-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TRAFODION-2117 started by liu ming. --- > add support of SQL extension 'EXCEPT' > -- > > Key: TRAFODION-2117 > URL: https://issues.apache.org/jira/browse/TRAFODION-2117 > Project: Apache Trafodion > Issue Type: New Feature >Reporter: liu ming >Assignee: liu ming > > add syntax and basic functional support for 'EXCEPT', meet the requirement to > run TPCDS as initial goal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TRAFODION-2047) add SQL syntax support to INTERSECT
[ https://issues.apache.org/jira/browse/TRAFODION-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liu ming resolved TRAFODION-2047. - Resolution: Fixed > add SQL syntax support to INTERSECT > --- > > Key: TRAFODION-2047 > URL: https://issues.apache.org/jira/browse/TRAFODION-2047 > Project: Apache Trafodion > Issue Type: New Feature >Reporter: liu ming >Assignee: liu ming > Fix For: 2.1-incubating > > > INTERSECT is a common SQL syntax. > Trafodion should support it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-2067) Support CDH 5.7
[ https://issues.apache.org/jira/browse/TRAFODION-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405402#comment-15405402 ] ASF GitHub Bot commented on TRAFODION-2067: --- Github user mashengchen commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/610#discussion_r73286115 --- Diff: core/sqf/src/seatrans/hbase-trx/src/main/java/org/apache/hadoop/hbase/coprocessor/transactional/SsccRegionEndpoint.java.tmpl --- @@ -236,7 +236,11 @@ CoprocessorService, Coprocessor { private HRegionInfo regionInfo = null; private HRegion m_Region = null; +#ifdef CDH5.7 APACHE1.2 + private HRegion t_Region = null; --- End diff -- the qualify of HRegion.getScanner(final Scan scan, final List scanners) has changed to public in hbase1.2, what the TransactionalRegion do before hbase1.2 is to make getScanner to be public , so in hbase1.2 it can change back to HRegion. and it may help to remove "hbase.hregion.impl" in hbase-site.xml in the future. > Support CDH 5.7 > --- > > Key: TRAFODION-2067 > URL: https://issues.apache.org/jira/browse/TRAFODION-2067 > Project: Apache Trafodion > Issue Type: Improvement > Environment: CDH 5.7 >Reporter: yuhong zhou >Assignee: mashengchen > > our bigdata platform CDH had been upgraded to 5.7. As tring to apply apache > trafodion as data warehouse based on bigdata distributions, we found current > release couldn;t support CDH 5.7 ,we wish next release can support it . > additionally, apache trafodion should be close to mainstream open soure > bigdata platform release, So, apache trafodion can be applied by more > companies as soon as possible -- This message was sent by Atlassian JIRA (v6.3.4#6332)