[ https://issues.apache.org/jira/browse/HIVE-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Bapat updated HIVE-21079: ---------------------------------- Attachment: HIVE-21079.01.patch Status: Patch Available (was: Open) Initial patch attached to trigger ptests. The patch also includes changes for stats replication for non-partitioned table since that code is used to replicate the table level statistics for a partitioned table as well. Once HIVE-21078 gets committed, this code will be removed from the patch. During bootstrap column stats for partitions is replicated along with the serialized Partition object for dump with data (non-metadata-only dump). During incremental load, UpdatePartitionColumnStats events are applied to replicate column statistics and ALTER PARTITION events are used to replicate partition level stats. The patch also includes two bug fixes: # ALTER PARTITION events not applied during incremental replication. In AlterPartitionHandler, we set withinContext.replicationSpec.setIsMetadataOnly(true); In ImportSemanticAnalyzer.createReplImportTasks(), per code around line 1197, we do not add new PartitionSpecs and corresponding tasks. This means that we never apply an ALTER_PARTITION event during incremental load. That looks like a serious bug. Either we should check PartitionDescs irrespective of replicationSpec.setIsMetadataOnly() OR we shouldn’t set replicationSpec.setIsMetadataOnly() to true while dumping an ALTER_PARTITION event. We set replicationSpec.setIsMetadataOnly(true) for ALTER TABLE events as well, so doing that for ALTER PARTITION event looks fine. # Do not dump partition related events during a metadata only dump. During bootstrap metadata-only dump we do not dump partitions (See TableExport.getPartitions(). For bootstrap dump we always pass TableSpec with TABLE_ONLY set.). So don't dump partition related events for a metadata-only dump. Those should probably get committed in separate tasks, but added here for completeness and testing. > Replicate column statistics for partitions of partitioned Hive table. > --------------------------------------------------------------------- > > Key: HIVE-21079 > URL: https://issues.apache.org/jira/browse/HIVE-21079 > Project: Hive > Issue Type: Sub-task > Reporter: Ashutosh Bapat > Assignee: Ashutosh Bapat > Priority: Major > Attachments: HIVE-21079.01.patch > > > This task is for replicating statistics for partitions of a partitioned Hive > table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)