Venugopal Reddy K created HIVE-26862:
----------------------------------------

             Summary: IndexOutOfBoundsException occurred in stats task during 
dynamic partition table load when user data for partition column is case 
sensitive. And few rows are missed in the partition as well.
                 Key: HIVE-26862
                 URL: https://issues.apache.org/jira/browse/HIVE-26862
             Project: Hive
          Issue Type: Bug
            Reporter: Venugopal Reddy K
         Attachments: data, hive.log

*[Description]* 

java.lang.IndexOutOfBoundsException occurred in stats task during dynamic 
partition table load. This happens when user data for partition column is case 
sensitive. And few rows are missed in the partition as well.

 

 

*[Steps to reproduce]*

1. Create stage table, load some data into stage table, create partition table 
and load data into that table from the stage table. data file is attached below.
{code:java}
0: jdbc:hive2://localhost:10000> create database mydb; 0: 
jdbc:hive2://localhost:10000> use mydb;
{code}
{code:java}
0: jdbc:hive2://localhost:10000> create table stage(num int, name string, 
category string) row format delimited fields terminated by ',' stored as 
textfile;
{code}
{code:java}
0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table 
stage;{code}
 
{code:java}
0: jdbc:hive2://localhost:10000> select * from stage;
+------------+-------------+---------------+
| stage.num  | stage.name  | stage.category|
+------------+-------------+---------------+
| 1          | apple       | Fruit         |
| 2          | banana      | Fruit         |
| 3          | carrot      | vegetable     |
| 4          | cherry      | Fruit         |
| 5          | potato      | vegetable     |
| 6          | mango       | Fruit         |
| 7          | tomato      | Vegetable     |=>V in vegetable is uppercase here
+------------+-------------+---------------+
7 rows selected (12.979 seconds)
{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> create table dynpart(num int, name string) 
partitioned by (category string) row format delimited fields terminated by ',' 
stored as textfile;{code}
 

 
{code:java}
0: jdbc:hive2://localhost:10000> insert into dynpart select * from stage;
INFO  : Compiling 
command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72):
 insert into dynpart select * from stage
INFO  : No Stats for mydb@stage, Columns: num, name, category
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:stage.num, 
type:int, comment:null), FieldSchema(name:stage.name, type:string, 
comment:null), FieldSchema(name:stage.category, type:string, comment:null)], 
properties:null)
INFO  : Completed compiling 
command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72);
 Time taken: 2.967 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing 
command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72):
 insert into dynpart select * from stage
WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. tez) or 
using Hive 1.X releases.
INFO  : Query ID = 
kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72
INFO  : Total jobs = 2
INFO  : Launching Job 1 out of 2
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_local729224564_0001
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2022-12-15 19:21:27,285 Stage-1 map = 0%,  reduce = 0%
INFO  : 2022-12-15 19:21:28,321 Stage-1 map = 100%,  reduce = 0%
INFO  : 2022-12-15 19:21:29,359 Stage-1 map = 100%,  reduce = 100%
INFO  : Ended Job = job_local729224564_0001
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table mydb.dynpart partition (category=null) from 
file:/tmp/warehouse/external/mydb.db/dynpart/.hive-staging_hive_2022-12-15_19-21-12_997_3457134057632526413-1/-ext-10000
INFO  : 


INFO  :          Time taken to load dynamic partitions: 33.657 seconds
INFO  :          Time taken for adding to write entity : 0.003 seconds
INFO  : Launching Job 2 out of 2
INFO  : Starting task [Stage-3:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_local1246165356_0002
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2022-12-15 19:22:13,511 Stage-3 map = 100%,  reduce = 100%
INFO  : Ended Job = job_local1246165356_0002
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Executing stats task
INFO  : Partition {category=Fruit} stats: [numFiles=1, numRows=4, totalSize=34, 
rawDataSize=30, numFilesErasureCoded=0]
INFO  : Partition {category=Vegetable} stats: [numFiles=1, numRows=1, 
totalSize=18, rawDataSize=8, numFilesErasureCoded=0]
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.StatsTask. java.lang.IndexOutOfBoundsException: 
Index: 2, Size: 2
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO  : Stage-Stage-3:  HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing 
command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72);
 Time taken: 452.037 seconds
Error: Error while compiling statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.StatsTask. 
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 
(state=08S01,code=1){code}
 

 

2. Warehouse directory dynpart table path

 
{code:java}
kvenureddy@192 dynpart % pwd
/tmp/warehouse/external/mydb.db/dynpart
kvenureddy@192 dynpart % ls
category=Fruit category=Vegetable
kvenureddy@192 dynpart % pwd
/tmp/warehouse/external/mydb.db/dynpart
kvenureddy@192 dynpart % ls
category=Fruit category=Vegetable
kvenureddy@192 dynpart % cd category=Vegetable 
kvenureddy@192 category=Vegetable % ls
000000_0
kvenureddy@192 category=Vegetable % cat 000000_0 
5,potato
3,carrot => Only 2 rows present. row(7,tomato) is missing in this partition.
kvenureddy@192 category=Vegetable % cd ..
kvenureddy@192 dynpart % ls
category=Fruit category=Vegetable
kvenureddy@192 dynpart % cd category=Fruit 
kvenureddy@192 category=Fruit % ls
000000_0
kvenureddy@192 category=Fruit % cat 000000_0 
6,mango
4,cherry
2,banana
1,apple
kvenureddy@192 category=Fruit % 
{code}
 

 

*[Exception Info]* 

Complete log file is attached.

 
{code:java}
2022-12-15T19:28:48,003 ERROR [HiveServer2-Background-Pool: Thread-123] 
metastore.RetryingHMSHandler: java.lang.IndexOutOfBoundsException: Index: 2, 
Size: 2
    at java.util.ArrayList.rangeCheck(ArrayList.java:659)
    at java.util.ArrayList.get(ArrayList.java:435)
    at 
org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9194)
    at 
org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:146)
    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
    at com.sun.proxy.$Proxy31.set_aggr_stats_for(Unknown Source)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:3307)
    at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:566)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
    at com.sun.proxy.$Proxy32.setPartitionColumnStatistics(Unknown Source)
    at 
org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5677)
    at 
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:221)
    at 
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:94)
    at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
    at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
    at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354)
    at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327)
    at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244)
    at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:370)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
    at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
    at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
    at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2022-12-15T19:28:48,004 ERROR [HiveServer2-Background-Pool: Thread-123] 
exec.StatsTask: Failed to run stats task
{code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to