mahesh kumar behera created HIVE-20533:
------------------------------------------
Summary: Adding notification is taking time in S3 replication
Key: HIVE-20533
URL: https://issues.apache.org/jira/browse/HIVE-20533
Project: Hive
Issue Type: Sub-task
Components: repl
Affects Versions: 4.0.0
Reporter: mahesh kumar behera
Assignee: mahesh kumar behera
Fix For: 4.0.0
In replication load, both add partition and insert operations are handled
through import. Import creates 3 major tasks. Copy, add partition and move.
Copy does the copy of data from source location to staging directory. Then add
partition (which runs in parallel to copy) creates the partition in meta store.
Its a no op in case of insert and by the time this ddl task is executed for
insert partition would be already present. The third operation is move. Which
actually moves the file from staging directory to actual location. And then in
case of insert it adds the insert event to notification table. It does this for
add partition operation which is redundant as the event for add partition would
have been written already by ddl task. With the optimization to copy directly
to actual table location in S3, move task can be avoided for add partition
operation replay and replay of insert need not create the add partition (ddl)
task.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)