[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan reassigned HIVE-20531: --------------------------------------- Assignee: mahesh kumar behera (was: Sankar Hariappan) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -------------------------------------------------------------------------------------- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl > Affects Versions: 4.0.0 > Reporter: mahesh kumar behera > Assignee: mahesh kumar behera > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)