[ 
https://issues.apache.org/jira/browse/HIVE-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21079:
----------------------------------
    Attachment: HIVE-21079.01.patch
        Status: Patch Available  (was: Open)

Initial patch attached to trigger ptests.

The patch also includes changes for stats replication for non-partitioned table 
since that code is used to replicate the table level statistics for a 
partitioned table as well. Once HIVE-21078 gets committed, this code will be 
removed from the patch.

During bootstrap column stats for partitions is replicated along with the 
serialized Partition object for dump with data (non-metadata-only dump). During 
incremental load, UpdatePartitionColumnStats events are applied to replicate 
column statistics and ALTER PARTITION events are used to replicate partition 
level stats.

The patch also includes two bug fixes:
 # ALTER PARTITION events not applied during incremental replication. In 
AlterPartitionHandler, we set 
withinContext.replicationSpec.setIsMetadataOnly(true);
 In ImportSemanticAnalyzer.createReplImportTasks(), per code around line 1197, 
we do not add new PartitionSpecs and corresponding tasks. This means that we 
never apply an ALTER_PARTITION event during incremental load. That looks like a 
serious bug. Either we should check PartitionDescs irrespective of  
replicationSpec.setIsMetadataOnly() OR we shouldn’t set 
replicationSpec.setIsMetadataOnly() to true while dumping an ALTER_PARTITION 
event. We set replicationSpec.setIsMetadataOnly(true) for ALTER TABLE events as 
well, so doing that for ALTER PARTITION event looks fine.
 # Do not dump partition related events during a metadata only dump. During 
bootstrap metadata-only dump we do not dump partitions (See 
TableExport.getPartitions(). For
 bootstrap dump we always pass TableSpec with TABLE_ONLY set.). So don't dump 
partition related events for a metadata-only dump.

 

Those should probably get committed in separate tasks, but added here for 
completeness and testing.

> Replicate column statistics for partitions of partitioned Hive table.
> ---------------------------------------------------------------------
>
>                 Key: HIVE-21079
>                 URL: https://issues.apache.org/jira/browse/HIVE-21079
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Ashutosh Bapat
>            Assignee: Ashutosh Bapat
>            Priority: Major
>         Attachments: HIVE-21079.01.patch
>
>
> This task is for replicating statistics for partitions of a partitioned Hive 
> table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to