[ https://issues.apache.org/jira/browse/HIVE-22068?focusedWorklogId=296103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296103 ]
ASF GitHub Bot logged work on HIVE-22068: ----------------------------------------- Author: ASF GitHub Bot Created on: 16/Aug/19 06:32 Start Date: 16/Aug/19 06:32 Worklog Time Spent: 10m Work Description: sankarh commented on pull request #742: HIVE-22068 : Add more logging to notification cleaner and replication to track events URL: https://github.com/apache/hive/pull/742#discussion_r314596395 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java ########## @@ -522,6 +525,25 @@ private int executeIncrementalLoad(DriverContext driverContext) { // bootstrap of tables if exist. if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || work.hasBootstrapLoadTasks()) { DAGTraversal.traverse(childTasks, new AddDependencyToLeaves(TaskFactory.get(work, conf))); + } else if (work.dbNameToLoadIn != null) { + // Nothing to be done for repl load now. Add a task to update the last.repl.id of the + // target database to the event id of the last event considered by the dump. Next + // incremental cycle if starts from this id, the events considered for this dump, won't + // be considered again. If we are replicating to multiple databases at a time, it's not + // possible to know which all databases we are replicating into and hence we can not + // update repl id in all those databases. + String lastEventid = builder.eventTo().toString(); Review comment: Can we try to re-use ReplLoadTask.updateDatabaseLastReplID method instead of duplicating the code here? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 296103) Time Spent: 20m (was: 10m) > Return the last event id dumped as repl status to avoid notification event > missing error. > ----------------------------------------------------------------------------------------- > > Key: HIVE-22068 > URL: https://issues.apache.org/jira/browse/HIVE-22068 > Project: Hive > Issue Type: Improvement > Reporter: Ashutosh Bapat > Assignee: Ashutosh Bapat > Priority: Major > Labels: pull-request-available > Attachments: HIVE-22068.01.patch, HIVE-22068.02.patch, > HIVE-22068.03.patch, HIVE-22068.04.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In repl load, update the status of target database to the last event dumped > so that repl status returns that and next incremental can specify it as the > event from which to start the dump. WIthout that repl status might return and > old event which might cause, older events to be dumped again and/or a > notification event missing error if the older events are cleaned by the > cleaner. > While at it > * Add more logging to DB notification listener cleaner thread > ** The time when it considered cleaning, the interval and time before which > events were cleared, the min and max id at that time > ** how many events were cleared > ** min and max id after the cleaning. > * In REPL::START document the starting event, end event if specified and the > maximum number of events, if specified. > * -- This message was sent by Atlassian JIRA (v7.6.14#76016)