[ https://issues.apache.org/jira/browse/TRAFODION-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840175#comment-15840175 ]
David Wayne Birdsall commented on TRAFODION-2465: ------------------------------------------------- I reproduced this again, this time capturing a ULOG. The ULOG shows that on the first "update statistics persistent" we correctly create the sample table with a LIMIT COLUMN LENGTH TO 256 clause. But the second time, this is absent. That causes a 4247 error which is not reported by the UPDATE STATISTICS logic. So, there are two issues here: 1. We are not reporting an error when we should be. 2. The LIMIT COLUMN LENGTH TO 256 clause is missing the second time. > UPDATE STATS failure on Hive table with long varchars > ----------------------------------------------------- > > Key: TRAFODION-2465 > URL: https://issues.apache.org/jira/browse/TRAFODION-2465 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Affects Versions: 2.1-incubating > Environment: Likely all > Reporter: David Wayne Birdsall > Assignee: David Wayne Birdsall > > The scenario is the following: > 1. Create a Hive text format table that has string columns with values > 1048576 bytes long. > 2. Create a Trafodion external table with varchar(1048576) columns for these > string columns. > 3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds. > 4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds. > 5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the > following errors: > *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an > error (8609) from statement Process_Query. > *** ERROR[8609] Waited rollback performed without starting a transaction. > *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an > error (8609) from statement Process_Query. > *** ERROR[8609] Waited rollback performed without starting a transaction. > *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an > error (8609) from statement Process_Query. > *** ERROR[8609] Waited rollback performed without starting a transaction. > *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an > error (8609) from statement Process_Query. > *** ERROR[8609] Waited rollback performed without starting a transaction. > *** ERROR[9214] Object > TRAFODION."_HIVESTATS_".TRAF_SAMPLE_93_1485453261_202793 could not be created. > *** ERROR[8609] Waited rollback performed without starting a transaction. > To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql > with the following contents and run them in this order: > chmod 755 datagen.py > ./datagen.py data.10rows_1MB.txt 10 2 1048576 > hive -f hive.sql > sqlci -i mytest.sql > $ cat datagen.py > #! /usr/bin/env python > import sys > if len(sys.argv) != 5 or \ > sys.argv[1].lower() == '-h' or \ > sys.argv[1].lower() == '-help': > print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar > columns> <varchar column length>' > sys.exit() > f = open(sys.argv[1], "w+") > marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ') > for num_rows in range(0, int(sys.argv[2])): > f.write(str(num_rows) + '|') > for num_cols in range(0, int(sys.argv[3])): > f.write(marker[num_rows%len(marker)]) > for i in range (1, int(sys.argv[4])): > f.write(str(i % 10)) > f.write('|') > f.write(str(num_rows)) > f.write('\n') > f.close() > $ cat hive.sql > drop database if exists mysch cascade; > create database mysch; > use mysch; > create table base_lgvarchar_table(c_int int, c_string1 string, c_string2 > string , p_int int) row format delimited fields terminated by '|'; > load data local inpath './data.10rows_10MB.txt' overwrite into table > base_lgvarchar_table; > create table mytable (c_int int, c_string1 string, c_string2 string ,p_int > int) row format delimited fields terminated by '|'; > from base_lgvarchar_table insert overwrite table mytable select *; > $ cat mytest.sql > log mytest.log clear; > control query default HIVE_MAX_STRING_LENGTH '1048576'; > set schema hive.mysch; > drop external table if exists mytable cascade; > create external table mytable (c_int int, c_string1 varchar(1048576), > c_string2 varchar(1048576), p_int int) for hive.mysch.mytable; > showddl mytable; > update statistics for table mytable on every column sample random 90 percent > persistent; > update statistics for table mytable remove sample; > update statistics for table mytable on existing column sample random 90 > percent persistent; > update statistics for table mytable remove sample; -- This message was sent by Atlassian JIRA (v6.3.4#6332)