David Wayne Birdsall created TRAFODION-2465:
-----------------------------------------------

             Summary: UPDATE STATS failure on Hive table with long varchars
                 Key: TRAFODION-2465
                 URL: https://issues.apache.org/jira/browse/TRAFODION-2465
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
    Affects Versions: 2.1-incubating
         Environment: Likely all
            Reporter: David Wayne Birdsall
            Assignee: David Wayne Birdsall


The scenario is the following:

1. Create a Hive text format table that has string columns with values 1048576 
bytes long.
2. Create a Trafodion external table with varchar(1048576) columns for these 
string columns.
3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds.
4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds.
5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the 
following errors:

*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
error (8609) from statement Process_Query.

*** ERROR[8609] Waited rollback performed without starting a transaction.

*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
error (8609) from statement Process_Query.

*** ERROR[8609] Waited rollback performed without starting a transaction.

*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
error (8609) from statement Process_Query.

*** ERROR[8609] Waited rollback performed without starting a transaction.

*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an 
error (8609) from statement Process_Query.

*** ERROR[8609] Waited rollback performed without starting a transaction.

*** ERROR[9214] Object TRAFODION."_HIVESTATS_".TRAF_SAMPLE_93_1485453261_202793 
could not be created.

*** ERROR[8609] Waited rollback performed without starting a transaction.

To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql with 
the following contents and run them in this order:

chmod 755 datagen.py
./datagen.py data.10rows_1MB.txt 10 2 1048576
hive -f hive.sql
sqlci -i mytest.sql

$ cat datagen.py
#! /usr/bin/env python
import sys

if len(sys.argv) != 5 or \
   sys.argv[1].lower() == '-h' or \
   sys.argv[1].lower() == '-help':
    print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar 
columns> <varchar column length>'
    sys.exit()

f = open(sys.argv[1], "w+")

marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
for num_rows in range(0, int(sys.argv[2])):
    f.write(str(num_rows) + '|')
    for num_cols in range(0, int(sys.argv[3])):
        f.write(marker[num_rows%len(marker)])
        for i in range (1, int(sys.argv[4])):
            f.write(str(i % 10))
        f.write('|')
    f.write(str(num_rows))
    f.write('\n')

f.close()

$ cat hive.sql
drop database if exists mysch cascade;
create database mysch;
use mysch;

create table base_lgvarchar_table(c_int int, c_string1 string, c_string2 string 
, p_int int) row format delimited fields terminated by '|';
load data local inpath './data.10rows_10MB.txt' overwrite into table 
base_lgvarchar_table;
create table mytable (c_int int, c_string1 string, c_string2 string ,p_int int) 
row format delimited fields terminated by '|';
from base_lgvarchar_table insert overwrite table mytable select *;


$ cat mytest.sql
log mytest.log clear;
control query default HIVE_MAX_STRING_LENGTH '1048576';
set schema hive.mysch;

drop external table if exists mytable cascade;
create external table mytable (c_int int, c_string1 varchar(1048576), c_string2 
varchar(1048576), p_int int) for hive.mysch.mytable;
showddl mytable;

update statistics for table mytable on every column sample random 90 percent 
persistent;
update statistics for table mytable remove sample;
update statistics for table mytable on existing column sample random 90 percent 
persistent;
update statistics for table mytable remove sample;






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to