David Wayne Birdsall created TRAFODION-2465: -----------------------------------------------
Summary: UPDATE STATS failure on Hive table with long varchars Key: TRAFODION-2465 URL: https://issues.apache.org/jira/browse/TRAFODION-2465 Project: Apache Trafodion Issue Type: Bug Components: sql-cmp Affects Versions: 2.1-incubating Environment: Likely all Reporter: David Wayne Birdsall Assignee: David Wayne Birdsall The scenario is the following: 1. Create a Hive text format table that has string columns with values 1048576 bytes long. 2. Create a Trafodion external table with varchar(1048576) columns for these string columns. 3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds. 4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds. 5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the following errors: *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query. *** ERROR[8609] Waited rollback performed without starting a transaction. *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query. *** ERROR[8609] Waited rollback performed without starting a transaction. *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query. *** ERROR[8609] Waited rollback performed without starting a transaction. *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an error (8609) from statement Process_Query. *** ERROR[8609] Waited rollback performed without starting a transaction. *** ERROR[9214] Object TRAFODION."_HIVESTATS_".TRAF_SAMPLE_93_1485453261_202793 could not be created. *** ERROR[8609] Waited rollback performed without starting a transaction. To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql with the following contents and run them in this order: chmod 755 datagen.py ./datagen.py data.10rows_1MB.txt 10 2 1048576 hive -f hive.sql sqlci -i mytest.sql $ cat datagen.py #! /usr/bin/env python import sys if len(sys.argv) != 5 or \ sys.argv[1].lower() == '-h' or \ sys.argv[1].lower() == '-help': print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar columns> <varchar column length>' sys.exit() f = open(sys.argv[1], "w+") marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ') for num_rows in range(0, int(sys.argv[2])): f.write(str(num_rows) + '|') for num_cols in range(0, int(sys.argv[3])): f.write(marker[num_rows%len(marker)]) for i in range (1, int(sys.argv[4])): f.write(str(i % 10)) f.write('|') f.write(str(num_rows)) f.write('\n') f.close() $ cat hive.sql drop database if exists mysch cascade; create database mysch; use mysch; create table base_lgvarchar_table(c_int int, c_string1 string, c_string2 string , p_int int) row format delimited fields terminated by '|'; load data local inpath './data.10rows_10MB.txt' overwrite into table base_lgvarchar_table; create table mytable (c_int int, c_string1 string, c_string2 string ,p_int int) row format delimited fields terminated by '|'; from base_lgvarchar_table insert overwrite table mytable select *; $ cat mytest.sql log mytest.log clear; control query default HIVE_MAX_STRING_LENGTH '1048576'; set schema hive.mysch; drop external table if exists mytable cascade; create external table mytable (c_int int, c_string1 varchar(1048576), c_string2 varchar(1048576), p_int int) for hive.mysch.mytable; showddl mytable; update statistics for table mytable on every column sample random 90 percent persistent; update statistics for table mytable remove sample; update statistics for table mytable on existing column sample random 90 percent persistent; update statistics for table mytable remove sample; -- This message was sent by Atlassian JIRA (v6.3.4#6332)