Andrew Wong has posted comments on this change.

Change subject: KUDU-1407: reassign failed tablets
......................................................................


Patch Set 17:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/7440/17//COMMIT_MSG
Commit Message:

PS17, Line 30: is added 
> this is repeated
Done


PS17, Line 31: failed tablets while running
> tablets that fail while running (due to what?)
Done


http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

Line 629:     NotifyObserversOfFailedFollower(peer_uuid, current_term, reason);
> nit: No need to hold the lock while calling this method.
Done


http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 170: DEFINE_bool(master_tombstone_failed_tablet_replicas, true,
> Should be removed per below. See master_tombstone_evicted_tablet_replica
Done


PS17, Line 2473:     if (FLAGS_master_tombstone_failed_tablet_replicas) {
               :       SendDeleteReplicaRequest(report.tablet_id(), 
TABLET_DATA_TOMBSTONED,
               :                                boost::none,
               :                                tablet->table(), 
ts_desc->permanent_uuid(),
               :                                Substitute("Tablet failed: $0", 
s.ToString()));
               :     }
> Is this required? The leader will now evict a failed follower because of th
I think you're right; when the leader sees the failed tablet, it should evict 
and config change, and then report to the master.


http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS17, Line 655: metadata
> Couldn't this simply happen if one of the data disks failed?
Failures when writing the data directory are passed off as WARN_NOT_OK() (see 
TabletMetadata::DeleteOrphanedBlocks), since the blocks can always be removed 
in the future (eg when we next startup).


PS17, Line 658: is unclear
> Shouldn't the contract of DeleteTabletData() be a crash-consistent one? In 
Ah, I see. I'll update the comment.


Line 752:   auto fail_tablet = MakeScopedCleanup([&]() {
> I like this approach.
It is quite clean indeed!

Credit to Adar for the suggestion.


-- 
To view, visit http://gerrit.cloudera.org:8080/7440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to