> IndexOutOfBoundsException occurred in stats task during dynamic partition 
> table load when user data for partition column is case sensitive. And few 
> rows are missed in the partition as well.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-26862
>                 URL: https://issues.apache.org/jira/browse/HIVE-26862
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Venugopal Reddy K
>            Priority: Major
>         Attachments: data, hive.log
> *[Description]* 
> java.lang.IndexOutOfBoundsException occurred in stats task during dynamic 
> partition table load. This happens when user data for partition column is 
> case sensitive. And few rows are missed in the partition as well.
> *[Steps to reproduce]*
> 1. Create stage table, load some data into stage table, create partition 
> table and load data into that table from the stage table. data file is 
> attached below. Last row in the data file is 
> {color:#de350b}7,tomato,Vegetable (V is vegetable is uppercase unlike other 
> rows of the vegetable partition){color}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create database mydb; 
> 0: jdbc:hive2://localhost:10000> use mydb;
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create table stage(num int, name string, 
> category string) row format delimited fields terminated by ',' stored as 
> textfile;
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> load data local inpath 'data' into table 
> stage;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> select * from stage;
> +------------+-------------+---------------+
> | stage.num  | stage.name  | stage.category|
> +------------+-------------+---------------+
> | 1          | apple       | Fruit         |
> | 2          | banana      | Fruit         |
> | 3          | carrot      | vegetable     |
> | 4          | cherry      | Fruit         |
> | 5          | potato      | vegetable     |
> | 6          | mango       | Fruit         |
> | 7          | tomato      | Vegetable     |=>V in vegetable is uppercase here
> +------------+-------------+---------------+
> 7 rows selected (12.979 seconds)
> {code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> ',' stored as textfile;{code}
> {code:java}
> 0: jdbc:hive2://localhost:10000> insert into dynpart select * from stage;
> INFO  : Compiling 
> command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72):
>  insert into dynpart select * from stage
> INFO  : No Stats for mydb@stage, Columns: num, name, category
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:stage.num, 
> type:int, comment:null), FieldSchema(name:stage.name, type:string, 
> comment:null), FieldSchema(name:stage.category, type:string, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72);
>  Time taken: 2.967 seconds
> INFO  : Operation QUERY obtained 0 locks
> INFO  : Executing 
> command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72):
>  insert into dynpart select * from stage
> WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez) or 
> using Hive 1.X releases.
> INFO  : Query ID = 
> kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72
> INFO  : Total jobs = 2
> INFO  : Launching Job 1 out of 2
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=<number>
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapreduce.job.reduces=<number>
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_local729224564_0001
> INFO  : Executing with tokens: []
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2022-12-15 19:21:27,285 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2022-12-15 19:21:28,321 Stage-1 map = 100%,  reduce = 0%
> INFO  : 2022-12-15 19:21:29,359 Stage-1 map = 100%,  reduce = 100%
> INFO  : Ended Job = job_local729224564_0001
> INFO  : Starting task [Stage-0:MOVE] in serial mode
> INFO  : Loading data to table mydb.dynpart partition (category=null) from 
> file:/tmp/warehouse/external/mydb.db/dynpart/.hive-staging_hive_2022-12-15_19-21-12_997_3457134057632526413-1/-ext-10000
> INFO  : 
> INFO  :        Time taken to load dynamic partitions: 33.657 seconds
> INFO  :        Time taken for adding to write entity : 0.003 seconds
> INFO  : Launching Job 2 out of 2
> INFO  : Starting task [Stage-3:MAPRED] in serial mode
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=<number>
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapreduce.job.reduces=<number>
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_local1246165356_0002
> INFO  : Executing with tokens: []
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2022-12-15 19:22:13,511 Stage-3 map = 100%,  reduce = 100%
> INFO  : Ended Job = job_local1246165356_0002
> INFO  : Starting task [Stage-2:STATS] in serial mode
> INFO  : Executing stats task
> INFO  : Partition {category=Fruit} stats: [numFiles=1, numRows=4, 
> totalSize=34, rawDataSize=30, numFilesErasureCoded=0]
> INFO  : Partition {category=Vegetable} stats: [numFiles=1, numRows=1, 
> totalSize=18, rawDataSize=8, numFilesErasureCoded=0]
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.StatsTask. 
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> INFO  : MapReduce Jobs Launched: 
> INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> INFO  : Stage-Stage-3:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> INFO  : Total MapReduce CPU Time Spent: 0 msec
> INFO  : Completed executing 
> command(queryId=kvenureddy_20221215192112_ae2e55b5-6b1f-402d-b79f-874261a27b72);
>  Time taken: 452.037 seconds
> Error: Error while compiling statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.StatsTask. 
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 
> (state=08S01,code=1){code}
> 2. Warehouse directory dynpart table path.
> {color:#de350b}Row(7,tomato) is missing in Vegetable partition{color}
> {code:java}
> kvenureddy@192 dynpart % pwd
> /tmp/warehouse/external/mydb.db/dynpart
> kvenureddy@192 dynpart % ls
> category=Fruit category=Vegetable
> kvenureddy@192 dynpart % pwd
> /tmp/warehouse/external/mydb.db/dynpart
> kvenureddy@192 dynpart % ls
> category=Fruit category=Vegetable
> kvenureddy@192 dynpart % cd category=Vegetable 
> kvenureddy@192 category=Vegetable % ls
> 000000_0
> kvenureddy@192 category=Vegetable % cat 000000_0 
> 5,potato
> 3,carrot => Only 2 rows present. row(7,tomato) is missing in this partition.
> kvenureddy@192 category=Vegetable % cd ..
> kvenureddy@192 dynpart % ls
> category=Fruit category=Vegetable
> kvenureddy@192 dynpart % cd category=Fruit 
> kvenureddy@192 category=Fruit % ls
> 000000_0
> kvenureddy@192 category=Fruit % cat 000000_0 
> 6,mango
> 4,cherry
> 2,banana
> 1,apple
> kvenureddy@192 category=Fruit % 
> {code}
> *[Exception Info]* 
> Complete log file is attached.
> {code:java}
> 2022-12-15T19:28:48,003 ERROR [HiveServer2-Background-Pool: Thread-123] 
> metastore.RetryingHMSHandler: java.lang.IndexOutOfBoundsException: Index: 2, 
> Size: 2
>     at java.util.ArrayList.rangeCheck(ArrayList.java:659)
>     at java.util.ArrayList.get(ArrayList.java:435)
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9194)
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:146)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>     at com.sun.proxy.$Proxy31.set_aggr_stats_for(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:3307)
>     at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:566)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
>     at com.sun.proxy.$Proxy32.setPartitionColumnStatistics(Unknown Source)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5677)
>     at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:221)
>     at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:94)
>     at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354)
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327)
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244)
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:370)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> 2022-12-15T19:28:48,004 ERROR [HiveServer2-Background-Pool: Thread-123] 
> exec.StatsTask: Failed to run stats task
> {code}

