[ 
https://issues.apache.org/jira/browse/TRAFODION-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840175#comment-15840175
 ] 

David Wayne Birdsall commented on TRAFODION-2465:
-------------------------------------------------

I reproduced this again, this time capturing a ULOG. The ULOG shows that on the 
first "update statistics persistent" we correctly create the sample table with 
a LIMIT COLUMN LENGTH TO 256 clause. But the second time, this is absent. That 
causes a 4247 error which is not reported by the UPDATE STATISTICS logic.

So, there are two issues here:
1.      We are not reporting  an error when we should be.
2.      The LIMIT COLUMN LENGTH TO 256 clause is missing the second time.


> UPDATE STATS failure on Hive table with long varchars
> -----------------------------------------------------
>
>                 Key: TRAFODION-2465
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2465
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: Likely all
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> The scenario is the following:
> 1. Create a Hive text format table that has string columns with values 
> 1048576 bytes long.
> 2. Create a Trafodion external table with varchar(1048576) columns for these 
> string columns.
> 3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds.
> 4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds.
> 5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the 
> following errors:
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9214] Object 
> TRAFODION."_HIVESTATS_".TRAF_SAMPLE_93_1485453261_202793 could not be created.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql 
> with the following contents and run them in this order:
> chmod 755 datagen.py
> ./datagen.py data.10rows_1MB.txt 10 2 1048576
> hive -f hive.sql
> sqlci -i mytest.sql
> $ cat datagen.py
> #! /usr/bin/env python
> import sys
> if len(sys.argv) != 5 or \
>    sys.argv[1].lower() == '-h' or \
>    sys.argv[1].lower() == '-help':
>     print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar 
> columns> <varchar column length>'
>     sys.exit()
> f = open(sys.argv[1], "w+")
> marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
> for num_rows in range(0, int(sys.argv[2])):
>     f.write(str(num_rows) + '|')
>     for num_cols in range(0, int(sys.argv[3])):
>         f.write(marker[num_rows%len(marker)])
>         for i in range (1, int(sys.argv[4])):
>             f.write(str(i % 10))
>         f.write('|')
>     f.write(str(num_rows))
>     f.write('\n')
> f.close()
> $ cat hive.sql
> drop database if exists mysch cascade;
> create database mysch;
> use mysch;
> create table base_lgvarchar_table(c_int int, c_string1 string, c_string2 
> string , p_int int) row format delimited fields terminated by '|';
> load data local inpath './data.10rows_10MB.txt' overwrite into table 
> base_lgvarchar_table;
> create table mytable (c_int int, c_string1 string, c_string2 string ,p_int 
> int) row format delimited fields terminated by '|';
> from base_lgvarchar_table insert overwrite table mytable select *;
> $ cat mytest.sql
> log mytest.log clear;
> control query default HIVE_MAX_STRING_LENGTH '1048576';
> set schema hive.mysch;
> drop external table if exists mytable cascade;
> create external table mytable (c_int int, c_string1 varchar(1048576), 
> c_string2 varchar(1048576), p_int int) for hive.mysch.mytable;
> showddl mytable;
> update statistics for table mytable on every column sample random 90 percent 
> persistent;
> update statistics for table mytable remove sample;
> update statistics for table mytable on existing column sample random 90 
> percent persistent;
> update statistics for table mytable remove sample;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to