[ https://issues.apache.org/jira/browse/HIVE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mahesh kumar behera updated HIVE-24663: --------------------------------------- Parent: HIVE-25181 Issue Type: Sub-task (was: Improvement) > Reduce overhead of partition column stats updation. > --------------------------------------------------- > > Key: HIVE-24663 > URL: https://issues.apache.org/jira/browse/HIVE-24663 > Project: Hive > Issue Type: Sub-task > Reporter: Rajesh Balamohan > Assignee: mahesh kumar behera > Priority: Major > Labels: performance, pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > > When large number of partitions (>20K) are processed, ColStatsProcessor runs > into DB issues. > {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together > and in some cases postgres stops processing. > It would be good to introduce small batches for stats gathering in > ColStatsProcessor instead of bulk update. > Ref: > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181 > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199 -- This message was sent by Atlassian Jira (v8.3.4#803005)