[jira] [Resolved] (TRAFODION-2090) improve the very first time of query compiling

2016-08-03 Thread liu ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu ming resolved TRAFODION-2090.
-
   Resolution: Duplicate
Fix Version/s: 2.1-incubating

duplicated TRAFODION-2137

> improve the very first time of query compiling
> --
>
> Key: TRAFODION-2090
> URL: https://issues.apache.org/jira/browse/TRAFODION-2090
> Project: Apache Trafodion
>  Issue Type: Improvement
>Reporter: liu ming
>Assignee: liu ming
> Fix For: 2.1-incubating
>
>
> Currently, the very first query run in master executor: sqlci or mxosrvr will 
> take seconds to compile. 
> Initial analysis shows most time spend on metadata retrieving.
> Need to improve the compile time for the very first time, at least from end 
> user's perspective.
> Speed up the metadata retrieval speed.
> Redesign metadata scheme
> Prefetch metadata
> are all among possible solutions, the aim is to shorten the time spend on the 
> SQL compile for the first time. Initial goal is within 1 second, faster the 
> better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (TRAFODION-2090) improve the very first time of query compiling

2016-08-03 Thread liu ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on TRAFODION-2090 stopped by liu ming.
---
> improve the very first time of query compiling
> --
>
> Key: TRAFODION-2090
> URL: https://issues.apache.org/jira/browse/TRAFODION-2090
> Project: Apache Trafodion
>  Issue Type: Improvement
>Reporter: liu ming
>Assignee: liu ming
> Fix For: 2.1-incubating
>
>
> Currently, the very first query run in master executor: sqlci or mxosrvr will 
> take seconds to compile. 
> Initial analysis shows most time spend on metadata retrieving.
> Need to improve the compile time for the very first time, at least from end 
> user's perspective.
> Speed up the metadata retrieval speed.
> Redesign metadata scheme
> Prefetch metadata
> are all among possible solutions, the aim is to shorten the time spend on the 
> SQL compile for the first time. Initial goal is within 1 second, faster the 
> better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-2139) Possible invalid UTF-8 string in TEXT metadata table

2016-08-03 Thread Hans Zeller (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407058#comment-15407058
 ] 

Hans Zeller commented on TRAFODION-2139:


Here is a test case (shows the issue in the code before Anoop's recent change 
for TRAFODION-2137. Probably reproduces it also with that change. If not, try 
adding one character somewhere in the beginning.

drop view t10k_utf8;
create view t10k_utf8 as
select *
from (values
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),
('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'),

[jira] [Created] (TRAFODION-2140) Port IUS from EsgynDB to Trafodion open source

2016-08-03 Thread Barry Fritchman (JIRA)
Barry Fritchman created TRAFODION-2140:
--

 Summary: Port IUS from EsgynDB to Trafodion open source
 Key: TRAFODION-2140
 URL: https://issues.apache.org/jira/browse/TRAFODION-2140
 Project: Apache Trafodion
  Issue Type: New Feature
Reporter: Barry Fritchman
Assignee: Barry Fritchman


It has been decided that the Incremental Update Statistics feature should be 
open source; this JIRA is for porting it to the incubator-trafodion code base 
from EsgynDB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-2079) [3rd party tool-Squirrel] indexes won't show up on objects panel, sql command line works fine

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406799#comment-15406799
 ] 

ASF GitHub Bot commented on TRAFODION-2079:
---

Github user xwq commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/594#discussion_r73438864
  
--- Diff: 
dcs/src/test/jdbc_test/src/test/java/org/trafodion/jdbc_test/TestGetIndexInfo.java
 ---
@@ -0,0 +1,180 @@
+// @@@ START COPYRIGHT @@@
+//
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+//
+// @@@ END COPYRIGHT @@@
+
+import java.sql.Connection;
+import java.sql.DatabaseMetaData;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.Statement;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import static org.junit.Assert.*;
+
+public class TestGetIndexInfo {
+   
+   private static final String INDEX_INFO_TEST_TABLE = 
"INDEX_INFO_TEST_TABLE";
+   
+   private static final String strCreateTableQuery = "CREATE TABLE " + 
INDEX_INFO_TEST_TABLE + "(C1 INT, C2 INT)";
+   private static final String strInsertQuery = "INSERT INTO " + 
INDEX_INFO_TEST_TABLE + " (C1, C2) VALUES (?, ?)";
+   private static final String strUpdateStatisticsQuery = "UPDATE 
STATISTICS FOR TABLE " + INDEX_INFO_TEST_TABLE + " ON (C1, C2)";
+   private static final String strDropTableQuery = "DROP TABLE " + 
INDEX_INFO_TEST_TABLE;
+   private static final String INDEX_C1_NAME = INDEX_INFO_TEST_TABLE + 
"_INDEX";
+   
+   private static final String strCreateIndexQuery = "CREATE INDEX " + 
INDEX_C1_NAME +" on " + INDEX_INFO_TEST_TABLE + "(C1)";
+   
+   private static Connection _conn = null;
+
+   @BeforeClass
+public static void doTestSuiteSetup() throws Exception {
+   try {
+   _conn = DriverManager.getConnection(Utils.url, 
Utils.usr, Utils.pwd);
+   Statement stmt = _conn.createStatement();
+   stmt.execute(strCreateTableQuery);
+   
+   PreparedStatement pstmt = 
_conn.prepareStatement(strInsertQuery);
+   int[][] testValues = {
+   {1, 2},
+   {10, 3},
+   {2, 2}
+   };   
+
+   for (int i = 0; i < testValues.length; i++) {
+   pstmt.setInt(1, testValues[i][0]);
+   pstmt.setInt(2, testValues[i][1]);
+   pstmt.addBatch();
+   }
+   pstmt.executeBatch();
+   pstmt.close();
+   
+   // create index
+   stmt.execute(strCreateIndexQuery);
+   
+   // update statistics on the table
+   stmt.execute(strUpdateStatisticsQuery);
+   
+   stmt.close();
+   }
+   catch (Exception e) {
+   System.out.println(e.getMessage());
+   e.printStackTrace();
+   } finally {
+   }
+}
+   
+   @Test
+public void testGetNoneUniqueIndexInfo() {
+   IndexInfo[] expIndexInfo = {
+   new IndexInfo("TRAFODION", "SEABASE", 
INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, 
"C1", 0, 3, (short)0, (String)null),
+   new IndexInfo("TRAFODION", "SEABASE", 
INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, 
"C2", 0, 3, (short)0, (String)null),
+   new IndexInfo("TRAFODION", "SEABASE", 
INDEX_INFO_TEST_TABLE, false, (String)null, INDEX_C1_NAME, (short)3, (short)0, 
"", 0, 0, 

[jira] [Assigned] (TRAFODION-2139) Possible invalid UTF-8 string in TEXT metadata table

2016-08-03 Thread Hans Zeller (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Zeller reassigned TRAFODION-2139:
--

Assignee: Hans Zeller

> Possible invalid UTF-8 string in TEXT metadata table
> 
>
> Key: TRAFODION-2139
> URL: https://issues.apache.org/jira/browse/TRAFODION-2139
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-cmu
>Affects Versions: 1.3-incubating
>Reporter: Hans Zeller
>Assignee: Hans Zeller
>
> Method CmpSeabaseDDL::updateTextTable() takes a string, splits it into chunks 
> of a maximum size and inserts them into the TEXT table. I have not yet 
> verified that this problem actually can occur, but I think there is a 
> possibility that the method will split a multi-byte UTF-8 character in the 
> middle and will get an error. To fix this, we should use method 
> findStartOfChar() in file sql/common/csconvert.h to find a good point to 
> split the text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2139) Possible invalid UTF-8 string in TEXT metadata table

2016-08-03 Thread Hans Zeller (JIRA)
Hans Zeller created TRAFODION-2139:
--

 Summary: Possible invalid UTF-8 string in TEXT metadata table
 Key: TRAFODION-2139
 URL: https://issues.apache.org/jira/browse/TRAFODION-2139
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-cmu
Affects Versions: 1.3-incubating
Reporter: Hans Zeller


Method CmpSeabaseDDL::updateTextTable() takes a string, splits it into chunks 
of a maximum size and inserts them into the TEXT table. I have not yet verified 
that this problem actually can occur, but I think there is a possibility that 
the method will split a multi-byte UTF-8 character in the middle and will get 
an error. To fix this, we should use method findStartOfChar() in file 
sql/common/csconvert.h to find a good point to split the text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406696#comment-15406696
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

Github user zellerh commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/637#discussion_r73429919
  
--- Diff: core/sql/optimizer/BindRelExpr.cpp ---
@@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA)
 }
 }
 
+   if (naTable->isHiveTable() && 
+   !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() 
||
+naTable->getClusteringIndex()->getHHDFSTableStats()
+->isSequenceFile()) &&
+   (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && 
+   (naTable->getRecordLength() >
+   CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024))
+ {
+   // do not raise error if buffersize is set though buffersize_bytes.
--- End diff --

Another thing you can do is to make HDFS_IO_BUFFERSIZE a FLOAT, then you 
can specify sizes of less than one KB as well. That would be less confusing. 
I'm not sure I understand how you wanted to use this as a backdoor and don't 
have a strong opinion, you will know better how you want to use it.


> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 106:\n  
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n  
> umber of esps to scan:4
>   Esp# Range#  StartOffsetBytesRead  FileName
> == ==    ==
>  0  00   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>  1  1   652820   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  2  2  1305640   652820  

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406654#comment-15406654
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

Github user sureshsubbiah commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/637#discussion_r73425322
  
--- Diff: core/sql/bin/SqlciErrors.txt ---
@@ -1282,6 +1282,7 @@ $1~String1 
 4223 42000 9 BEGINNER MAJOR DBADMIN $0~String0 not supported in this 
software version.
 4224 42000 9 BEGINNER MAJOR DBADMIN The directory $0~String0 is not a 
valid Hive location.
 4225 42000 9 BEGINNER MAJOR DBADMIN Number of column families cannot 
exceed 32.
+4226 42000 9 BEGINNER MAJOR DBADMIN Table $0~TableName has a maximum 
record length of $0~Int0 which is greater than HDFS_IO_BUFFERSIZE. Increase 
buffer size setting, or reduce HIVE_MAX_STRING_LENGTH.
--- End diff --

Done. Thanks for the suggestion.


> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 106:\n  
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n  
> umber of esps to scan:4
>   Esp# Range#  StartOffsetBytesRead  FileName
> == ==    ==
>  0  00   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>  1  1   652820   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  2  2  1305640   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  3  3  1958460   652819  t031hive/T031HIVE-0-20160716231120-1 
> (split_b)
> Summary of bytes read per ESP (2611279 = 100 percent):
> ESP0 reads 652820 bytes ( 100 percent of avg)
> ESP1 reads 652820 bytes ( 

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406653#comment-15406653
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

Github user sureshsubbiah commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/637#discussion_r73425266
  
--- Diff: core/sql/optimizer/BindRelExpr.cpp ---
@@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA)
 }
 }
 
+   if (naTable->isHiveTable() && 
+   !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() 
||
+naTable->getClusteringIndex()->getHHDFSTableStats()
+->isSequenceFile()) &&
+   (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && 
+   (naTable->getRecordLength() >
+   CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024))
+ {
+   // do not raise error if buffersize is set though buffersize_bytes.
--- End diff --

HDFS_IO_BUFFERSIZE_BYTES was added initially in cases where we wanted to 
specify the buffer size in granularity less than a KB. The comment indicates 
that this is for testing. I was tempted to remove the cqd, but thought we may 
still need to test with precisely defined buffer sizes.

I am viewing BUFFERSIZE_BYTES as some kind of internal switch, to be used 
by expert users to override the BUFFERSIZE setting. In case users have a wide 
table but for some reason want to avoid the new error they can use 
BUFFERSIZE_BYTES to skip the new error code. Wide tables can be handled if the 
actual rows are indeed small enough to fit in a buffer, the user may have other 
reasons for not wanting to changing defaults such as HIVE_MAX_STRING_LENGTH.

If we prefer, I can also remove this BUFFERSIZE_BYTES backdoor.


> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 106:\n  
> 1001013:\n   
> 1011020:\n
> 027:\n   

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406626#comment-15406626
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

Github user zellerh commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/637#discussion_r73422384
  
--- Diff: core/sql/bin/SqlciErrors.txt ---
@@ -1282,6 +1282,7 @@ $1~String1 
 4223 42000 9 BEGINNER MAJOR DBADMIN $0~String0 not supported in this 
software version.
 4224 42000 9 BEGINNER MAJOR DBADMIN The directory $0~String0 is not a 
valid Hive location.
 4225 42000 9 BEGINNER MAJOR DBADMIN Number of column families cannot 
exceed 32.
+4226 42000 9 BEGINNER MAJOR DBADMIN Table $0~TableName has a maximum 
record length of $0~Int0 which is greater than HDFS_IO_BUFFERSIZE. Increase 
buffer size setting, or reduce HIVE_MAX_STRING_LENGTH.
--- End diff --

Maybe change HDFS_IO_BUFFERSIZE to "the HDFS_IO_BUFFERSIZE default"? People 
might not know otherwise how to change it.


> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 106:\n  
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n  
> umber of esps to scan:4
>   Esp# Range#  StartOffsetBytesRead  FileName
> == ==    ==
>  0  00   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>  1  1   652820   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  2  2  1305640   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  3  3  1958460   652819  t031hive/T031HIVE-0-20160716231120-1 
> (split_b)
> Summary of bytes read per ESP (2611279 = 100 percent):
> ESP0 reads 

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406619#comment-15406619
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

Github user zellerh commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/637#discussion_r73421938
  
--- Diff: core/sql/optimizer/BindRelExpr.cpp ---
@@ -7582,6 +7582,24 @@ RelExpr *Scan::bindNode(BindWA *bindWA)
 }
 }
 
+   if (naTable->isHiveTable() && 
+   !(naTable->getClusteringIndex()->getHHDFSTableStats()->isOrcFile() 
||
+naTable->getClusteringIndex()->getHHDFSTableStats()
+->isSequenceFile()) &&
+   (CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE_BYTES) == 0) && 
+   (naTable->getRecordLength() >
+   CmpCommon::getDefaultNumeric(HDFS_IO_BUFFERSIZE)*1024))
+ {
+   // do not raise error if buffersize is set though buffersize_bytes.
--- End diff --

What if the user users HDFS_IO_BUFFERSIZE_BYTES and that size is too small? 
Sorry, I don't understand the meaning of the 2 CQDs, are they for the same 
thing and just use different units?


> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 106:\n  
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n  
> umber of esps to scan:4
>   Esp# Range#  StartOffsetBytesRead  FileName
> == ==    ==
>  0  00   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>  1  1   652820   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  2  2  1305640   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_b/e)
>  3  3  1958460   652819  t031hive/T031HIVE-0-20160716231120-1 
> 

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406615#comment-15406615
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

Github user zellerh commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/637#discussion_r73421305
  
--- Diff: core/sql/generator/GenRelScan.cpp ---
@@ -1182,6 +1181,15 @@ if (hTabStats->isOrcFile())
 
   UInt32 rangeTailIOSize = (UInt32)
   CmpCommon::getDefaultNumeric(HDFS_IO_RANGE_TAIL);
+  if (rangeTailIOSize == 0) 
+{
+  rangeTailIOSize = getTableDesc()->getNATable()->getRecordLength() +
--- End diff --

This record length is what the compiler thinks, the real length may be more 
or less. Hive makes it probably more complex, some fields like 
date/time/timestamp are strings in Hive and encoded binaries in Trafodion. 
Character sets may be different as well in some cases (e.g. GBK). I don't have 
a better solution than what the code does, but let's say we have this: create 
table t(a char(2) not null, b timestamp(6) not null, c timestamp(6) not 
null). Trafodion will make the length 20022. In reality, a string may look like 
"20k chars..., 2015-01-01 00:00:00.000, 2016-01-01 00:00:00.000". That's 2049 
characters. Maybe the solution is to add the 16K to the record length, not take 
the max?


> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp1 end (1305640)  esp2 end (1958460)
> 2611171:\n   
> 2611278:\n   
> -- 2 esp plan
> 106:\n  
> 1001013:\n   
> 1011020:\n
> 027:\n   
> 134:\n   
> 151:\n   
> 159:\n
> 265:\n  esp0 end (1305640)  
> 2611171:\n   
> 2611278:\n  
> umber of esps to scan:4
>   Esp# Range#  StartOffsetBytesRead  FileName
> == ==    ==
>  0  00   652820  t031hive/T031HIVE-0-20160716231120-1 
> (split_e)
>  1  1   652820   652820  

[jira] [Commented] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406581#comment-15406581
 ] 

ASF GitHub Bot commented on TRAFODION-2138:
---

GitHub user sureshsubbiah opened a pull request:

https://github.com/apache/incubator-trafodion/pull/637

[TRAFODION-2138] Hive scan on wide tables can result in lost rows or error

The fix has three parts
a)  RangeTailIOSize will now default to maximum length of a row, if > 16KB. 
Each esp range will look ahead by this size. Previously this was hard coded 
to 16KB.
b)  If a whole buffer does not have the start of a row, that is not an 
error.
c)  We raise an error during compile if maximum row size is greater than 
size of IO buffer. 
We need to have the whole row (from start to finish) in a contiguous buffer 
to do the rest of our logic. 
Currently IO buffer defaults to 64 MB.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sureshsubbiah/incubator-trafodion traf-2094

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-trafodion/pull/637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #637


commit 6d266221b959c14319a760716939f261a2460276
Author: Suresh Subbiah 
Date:   2016-08-03T20:35:43Z

[TRAFODION-2138] Hive scan on wide tables can result in lost rows or error

The fix has three parts
a)  RangeTailIOSize will now default to maximum length of a row, if > 16KB.
Each esp range will look ahead by this size. Previously this was hard coded 
to 16KB.
b)  If a whole buffer does not have the start of a row, that is not an 
error.
c)  We raise an error during compile if maximum row size is greater than 
size of IO buffer.
We need to have the whole row (from start to finish) in a contiguous buffer 
to do the rest of our logic.
Currently IO buffer defaults to 64 MB.




> Hive scan on wide tables can result in lost rows or error
> -
>
> Key: TRAFODION-2138
> URL: https://issues.apache.org/jira/browse/TRAFODION-2138
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Suresh Subbiah
>Assignee: Suresh Subbiah
> Fix For: 2.0-incubating
>
>
> When a Hive table has wide rows (say greater than 16KB), and the scan is 
> parallelized. It is possible that one one these two problem situations may 
> arise
> a) An error with sqlcode 8446 is raised
> *** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
> record delimiter found in buffer from hdfsRead.
> b) some rows are lost.
> Both these situations occur since the scan is broken up into ranges and the 
> lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which 
> defaults to 16KB. This setting is also hardcoded in a crucial location so 
> changing the default does not help.
> Test case to see this problem is
> cqd traf_max_character_col_length '100';
> drop table if exists t031t1;
> create table t031t1 (z int not null primary key, a varchar(100), b 
> char(100));
> insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
> insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
> insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
> insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
> insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
> insert into t031t1 values (8, repeat('a', 100, 100) , null);
> insert into t031t1 values (9, repeat('a', 50, 50) , null);
> insert into t031t1 values (10, repeat('a', 100, 100) , null);
> --in hive
> create table t031hive(z int, a string, b string);
> -- in sqlci
> cqd hive_max_string_length '100';
> insert into hive.hive.t031hive select * from t031t1;
> create external table t031hive (z int, a varchar(100), b 
> varchar(100)) for hive.hive.t031hive;
> control query shape sort_groupby(exchange(cut));
> -- 2 esp plan
> cqd hive_min_bytes_per_esp_partition '200' ;
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> cqd hive_min_bytes_per_esp_partition '100' ;
> -- 4 esp plan
> prepare s2 from select count(*) from hive.hive.t031hive;
> explain options 'f' s2 ;
> execute s2 ;
> The row boundaries and esp scan rage boundaries are shown below
> hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
> -- 4 esp plan
> 106:\n  esp0 end (652820)
> 

[jira] [Created] (TRAFODION-2138) Hive scan on wide tables can result in lost rows or error

2016-08-03 Thread Suresh Subbiah (JIRA)
Suresh Subbiah created TRAFODION-2138:
-

 Summary: Hive scan on wide tables can result in lost rows or error
 Key: TRAFODION-2138
 URL: https://issues.apache.org/jira/browse/TRAFODION-2138
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Affects Versions: 2.0-incubating
Reporter: Suresh Subbiah
Assignee: Suresh Subbiah
 Fix For: 2.0-incubating


When a Hive table has wide rows (say greater than 16KB), and the scan is 
parallelized. It is possible that one one these two problem situations may arise

a) An error with sqlcode 8446 is raised
*** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No 
record delimiter found in buffer from hdfsRead.

b) some rows are lost.

Both these situations occur since the scan is broken up into ranges and the 
lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which defaults 
to 16KB. This setting is also hardcoded in a crucial location so changing the 
default does not help.

Test case to see this problem is

cqd traf_max_character_col_length '100';
drop table if exists t031t1;
create table t031t1 (z int not null primary key, a varchar(100), b 
char(100));
insert into t031t1 values (1, repeat('a', 100, 100) , 'def');
insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
insert into t031t1 values (3, repeat('a', 1, 1) , 'zzz');
insert into t031t1 values (4, repeat('a', 10, 10) , 'zzz');
insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
insert into t031t1 values (8, repeat('a', 100, 100) , null);
insert into t031t1 values (9, repeat('a', 50, 50) , null);
insert into t031t1 values (10, repeat('a', 100, 100) , null);


--in hive
create table t031hive(z int, a string, b string);

-- in sqlci
cqd hive_max_string_length '100';
insert into hive.hive.t031hive select * from t031t1;
create external table t031hive (z int, a varchar(100), b varchar(100)) 
for hive.hive.t031hive;

control query shape sort_groupby(exchange(cut));
-- 2 esp plan
cqd hive_min_bytes_per_esp_partition '200' ;
prepare s2 from select count(*) from hive.hive.t031hive;
explain options 'f' s2 ;
execute s2 ;

cqd hive_min_bytes_per_esp_partition '100' ;
-- 4 esp plan
prepare s2 from select count(*) from hive.hive.t031hive;
explain options 'f' s2 ;
execute s2 ;


The row boundaries and esp scan rage boundaries are shown below

hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"

-- 4 esp plan
106:\n  esp0 end (652820)
1001013:\n   
1011020:\n
027:\n   
134:\n   
151:\n   
159:\n
265:\n  esp1 end (1305640)  esp2 end (1958460)
2611171:\n   
2611278:\n   

-- 2 esp plan
106:\n  
1001013:\n   
1011020:\n
027:\n   
134:\n   
151:\n   
159:\n
265:\n  esp0 end (1305640)  
2611171:\n   
2611278:\n  

umber of esps to scan:4

  Esp# Range#  StartOffsetBytesRead  FileName
== ==    ==

 0  00   652820  t031hive/T031HIVE-0-20160716231120-1 
(split_e)
 1  1   652820   652820  t031hive/T031HIVE-0-20160716231120-1 
(split_b/e)
 2  2  1305640   652820  t031hive/T031HIVE-0-20160716231120-1 
(split_b/e)
 3  3  1958460   652819  t031hive/T031HIVE-0-20160716231120-1 
(split_b)

Summary of bytes read per ESP (2611279 = 100 percent):

ESP0 reads 652820 bytes ( 100 percent of avg)
ESP1 reads 652820 bytes ( 100 percent of avg)
ESP2 reads 652820 bytes ( 100 percent of avg)
ESP3 reads 652819 bytes (  99 percent of avg)
hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
modTSforDir_ = 1468710680, numOfPartCols_ = 0

Number of ranges to scan: 2
Number of esps to scan:2

  Esp# Range#  StartOffsetBytesRead  FileName
== ==    ==

 0  00  1305640  t031hive/T031HIVE-0-20160716231120-1 
(split_e)
 1  1  1305640  1305639  t031hive/T031HIVE-0-20160716231120-1 
(split_b)

Summary of bytes read per ESP (2611279 = 100 percent):

ESP0 reads1305640 bytes ( 100 percent of avg)
ESP1 reads1305639 bytes (  99 percent of avg)
hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
modTSforDir_ = 1468710680, numOfPartCols_ = 0


The fix has three parts

a)  RangeTailIOSize will now default to maximum length of a row, if > 16KB. 
Each esp range will look ahead by this size. Previously this was hard coded to 
16KB.
b)  If a whole buffer does not have the start of a row, that is not an 
error.
c)  We 

[jira] [Commented] (TRAFODION-2119) create table report strange WARNING when using store by and allow_nullable_unique_key_constraint CQD

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406200#comment-15406200
 ] 

ASF GitHub Bot commented on TRAFODION-2119:
---

Github user traflm closed the pull request at:

https://github.com/apache/incubator-trafodion/pull/633


> create table report strange WARNING when using store by and 
> allow_nullable_unique_key_constraint CQD
> 
>
> Key: TRAFODION-2119
> URL: https://issues.apache.org/jira/browse/TRAFODION-2119
> Project: Apache Trafodion
>  Issue Type: Bug
>Reporter: liu ming
>Assignee: liu ming
>
> cqd allow_nullable_unique_key_constraint 'on' ; 
> create table T_RH_DZSP
> (
>   s_orgcode   CHAR(12) character set utf8,
>   s_applydate CHAR(8) character set utf8,
>   s_packnovarchar(8) character set utf8,
>   s_trecode   CHAR(10) character set utf8,
>   s_trename   varchar(60) character set utf8,
>   s_taxorgcodevarchar(12) character set utf8,
>   s_paybnkno  varchar(12) character set utf8,
>   s_trano varchar(8) character set utf8,
>   s_orimsgno  CHAR(4) character set utf8,
>   f_traamtnumeric(18,2),
>   s_payeropnbnkno varchar(12) character set utf8,
>   payeropbkname   varchar(60) character set utf8,
>   s_handorgname   varchar(200) character set utf8,
>   s_payacct   varchar(32) character set utf8,
>   s_taxvouno  varchar(20) character set utf8,
>   s_billdate  CHAR(8) character set utf8,
>   s_taxpaycodevarchar(20) character set utf8,
>   s_taxpaynamevarchar(200) character set utf8,
>   c_budgettypeCHAR(1) character set utf8,
>   c_trimflag  CHAR(1) character set utf8,
>   s_etpcode   varchar(20) character set utf8,
>   s_etpname   varchar(200) character set utf8,
>   s_etptype   varchar(20) character set utf8,
>   s_bdgsbtcodevarchar(30) character set utf8,
>   s_bdgsbtnamevarchar(60) character set utf8,
>   s_limit CHAR(8) character set utf8,
>   s_taxtypecode   CHAR(10) character set utf8,
>   s_taxkindname   varchar(60) character set utf8,
>   c_bdglevel  CHAR(1) character set utf8,
>   c_bdglevelname  varchar(60) character set utf8,
>   s_taxstartdate  CHAR(8) character set utf8,
>   s_taxenddateCHAR(8) character set utf8,
>   s_astflag   varchar(35) character set utf8,
>   c_taxtype   CHAR(1) character set utf8,
>   s_acct  CHAR(8) character set utf8,
>   s_trastate  CHAR(2) character set utf8,
>   s_remarkvarchar(60) character set utf8,
>   s_remark1   varchar(60) character set utf8,
>   s_remark2   varchar(60) character set utf8,
>   s_inputerid varchar(18) character set utf8,
>   ts_sysupdatevarchar(100) character set utf8,
>   s_seq   varchar(100) character set utf8
> )
> store by (S_APPLYDATE)
> salt using 48 partitions on (S_APPLYDATE)
> ATTRIBUTES ALIGNED FORMAT
> HBASE_OPTIONS
> (
>   DATA_BLOCK_ENCODING = 'FAST_DIFF',
>  -- COMPRESSION = 'SNAPPY',
>   MEMSTORE_FLUSH_SIZE = '1073741824'
> );
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of a 
> character expression.
> *** WARNING[8402] A string overflow occurred during the evaluation of 

[jira] [Commented] (TRAFODION-2079) [3rd party tool-Squirrel] indexes won't show up on objects panel, sql command line works fine

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406073#comment-15406073
 ] 

ASF GitHub Bot commented on TRAFODION-2079:
---

Github user selvaganesang commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/594#discussion_r73360698
  
--- Diff: 
dcs/src/test/jdbc_test/src/test/java/org/trafodion/jdbc_test/TestGetIndexInfo.java
 ---
@@ -0,0 +1,180 @@
+// @@@ START COPYRIGHT @@@
+//
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+//
+// @@@ END COPYRIGHT @@@
+
+import java.sql.Connection;
+import java.sql.DatabaseMetaData;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.Statement;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import static org.junit.Assert.*;
+
+public class TestGetIndexInfo {
+   
+   private static final String INDEX_INFO_TEST_TABLE = 
"INDEX_INFO_TEST_TABLE";
+   
+   private static final String strCreateTableQuery = "CREATE TABLE " + 
INDEX_INFO_TEST_TABLE + "(C1 INT, C2 INT)";
+   private static final String strInsertQuery = "INSERT INTO " + 
INDEX_INFO_TEST_TABLE + " (C1, C2) VALUES (?, ?)";
+   private static final String strUpdateStatisticsQuery = "UPDATE 
STATISTICS FOR TABLE " + INDEX_INFO_TEST_TABLE + " ON (C1, C2)";
+   private static final String strDropTableQuery = "DROP TABLE " + 
INDEX_INFO_TEST_TABLE;
+   private static final String INDEX_C1_NAME = INDEX_INFO_TEST_TABLE + 
"_INDEX";
+   
+   private static final String strCreateIndexQuery = "CREATE INDEX " + 
INDEX_C1_NAME +" on " + INDEX_INFO_TEST_TABLE + "(C1)";
+   
+   private static Connection _conn = null;
+
+   @BeforeClass
+public static void doTestSuiteSetup() throws Exception {
+   try {
+   _conn = DriverManager.getConnection(Utils.url, 
Utils.usr, Utils.pwd);
+   Statement stmt = _conn.createStatement();
+   stmt.execute(strCreateTableQuery);
+   
+   PreparedStatement pstmt = 
_conn.prepareStatement(strInsertQuery);
+   int[][] testValues = {
+   {1, 2},
+   {10, 3},
+   {2, 2}
+   };   
+
+   for (int i = 0; i < testValues.length; i++) {
+   pstmt.setInt(1, testValues[i][0]);
+   pstmt.setInt(2, testValues[i][1]);
+   pstmt.addBatch();
+   }
+   pstmt.executeBatch();
+   pstmt.close();
+   
+   // create index
+   stmt.execute(strCreateIndexQuery);
+   
+   // update statistics on the table
+   stmt.execute(strUpdateStatisticsQuery);
+   
+   stmt.close();
+   }
+   catch (Exception e) {
+   System.out.println(e.getMessage());
+   e.printStackTrace();
+   } finally {
+   }
+}
+   
+   @Test
+public void testGetNoneUniqueIndexInfo() {
+   IndexInfo[] expIndexInfo = {
+   new IndexInfo("TRAFODION", "SEABASE", 
INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, 
"C1", 0, 3, (short)0, (String)null),
+   new IndexInfo("TRAFODION", "SEABASE", 
INDEX_INFO_TEST_TABLE, false, (String)null, (String)null, (short)0, (short)0, 
"C2", 0, 3, (short)0, (String)null),
+   new IndexInfo("TRAFODION", "SEABASE", 
INDEX_INFO_TEST_TABLE, false, (String)null, INDEX_C1_NAME, (short)3, (short)0, 
"", 

[jira] [Work started] (TRAFODION-2117) add support of SQL extension 'EXCEPT'

2016-08-03 Thread liu ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on TRAFODION-2117 started by liu ming.
---
> add support of SQL extension 'EXCEPT' 
> --
>
> Key: TRAFODION-2117
> URL: https://issues.apache.org/jira/browse/TRAFODION-2117
> Project: Apache Trafodion
>  Issue Type: New Feature
>Reporter: liu ming
>Assignee: liu ming
>
> add syntax and basic functional support for 'EXCEPT', meet the requirement to 
> run TPCDS as initial goal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TRAFODION-2047) add SQL syntax support to INTERSECT

2016-08-03 Thread liu ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu ming resolved TRAFODION-2047.
-
Resolution: Fixed

> add SQL syntax support to INTERSECT
> ---
>
> Key: TRAFODION-2047
> URL: https://issues.apache.org/jira/browse/TRAFODION-2047
> Project: Apache Trafodion
>  Issue Type: New Feature
>Reporter: liu ming
>Assignee: liu ming
> Fix For: 2.1-incubating
>
>
> INTERSECT is a common SQL syntax. 
> Trafodion should support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-2067) Support CDH 5.7

2016-08-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405402#comment-15405402
 ] 

ASF GitHub Bot commented on TRAFODION-2067:
---

Github user mashengchen commented on a diff in the pull request:

https://github.com/apache/incubator-trafodion/pull/610#discussion_r73286115
  
--- Diff: 
core/sqf/src/seatrans/hbase-trx/src/main/java/org/apache/hadoop/hbase/coprocessor/transactional/SsccRegionEndpoint.java.tmpl
 ---
@@ -236,7 +236,11 @@ CoprocessorService, Coprocessor {
 
   private HRegionInfo regionInfo = null;
   private HRegion m_Region = null;
+#ifdef CDH5.7 APACHE1.2
+  private HRegion t_Region = null;
--- End diff --

the qualify of HRegion.getScanner(final Scan scan, final 
List scanners) has changed to public in hbase1.2, what the 
TransactionalRegion do before hbase1.2 is to make getScanner to be public , so 
in hbase1.2 it can change back to HRegion. and it may help to remove 
"hbase.hregion.impl" in hbase-site.xml in the future.


> Support CDH 5.7
> ---
>
> Key: TRAFODION-2067
> URL: https://issues.apache.org/jira/browse/TRAFODION-2067
> Project: Apache Trafodion
>  Issue Type: Improvement
> Environment: CDH 5.7
>Reporter: yuhong zhou
>Assignee: mashengchen
>
> our bigdata platform CDH had been upgraded to 5.7. As tring to apply  apache 
> trafodion as data warehouse based on bigdata distributions, we found current 
> release couldn;t support CDH 5.7 ,we wish next release can support it . 
> additionally, apache trafodion should be close to mainstream open soure 
> bigdata platform release,  So,  apache trafodion can be applied by more 
> companies as soon as possible



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)