Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8441 )
Change subject: disk failure: make various delta paths more robust to errors ...................................................................... Patch Set 3: (7 comments) Inching back towards the ostrich method and putting in an IsGTest() hack here. http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_compaction.cc@372 PS3, Line 372: RETURN_NOT_OK(tracker->OpenDeltaReaders(new_redo_blocks, &new_redo_stores, REDO)); > Can you add a TODO here that these OpenDeltaReaders calls could actually ha Done http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.h File src/kudu/tablet/delta_tracker.h: http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.h@164 PS3, Line 164: in-memory delta stores and then persists them to disk > I think this is kind of unclear. Done http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@307 PS3, Line 307: if (start_it != end_it) { > is this necessary? erase() with an empty range should be safe Done http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@719 PS3, Line 719: // In case the flush fails, remove the DeltaMemStore from the store list. > I dont think this is safe. Another update could have landed in the new 'dms Trying to determine whether best-effort StoppedTablet checks are sufficient here to prevent us from returning bad data. And I guess a bigger question is whether this could lead us down a darker path of trying to do something that relies on the dropped dms. http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@720 PS3, Line 720: auto cleanup_redo_delta_stores = MakeScopedCleanup([&] { > nit: can use the new SCOPED_CLEANUP macro here Mm I'm not sure about that, since it needs to be canceled and needs a name? http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@757 PS3, Line 757: // TODO(unknown): wherever we write stuff, we should write to a tmp path and > I think this can just be removed now, it's super ancient (goes back to http Done http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/diskrowset.cc@575 PS3, Line 575: // revert changes in case of an error. > not sure this is safe in the general case, again because as soon as we swap Hrm I've been waffling back and forth between whether it'd be sufficient to assert that the tablet's been stopped here. I think it might take even more CheckNotStopped calls in many places, not just the DiskRowSet/DeltaTracker. I guess the big question is whether it's OK to have the {DeltaTracker, DiskRowSet, RowSetMetadata} be out of sync with each other (or if this is OK, rolling back changes. I think this might actually be more questionable), provided Stopped tablet checks are best-effort. Per your suggestion, another solution might be to sidestep this whole issue with a IsGTest() workaround or somesuch. -- To view, visit http://gerrit.cloudera.org:8080/8441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141 Gerrit-Change-Number: 8441 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 07 Nov 2017 21:28:02 +0000 Gerrit-HasComments: Yes
