Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8441 )

Change subject: disk failure: make various delta paths more robust to errors
......................................................................


Patch Set 3:

(7 comments)

Inching back towards the ostrich method and putting in an IsGTest() hack here.

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_compaction.cc@372
PS3, Line 372:   RETURN_NOT_OK(tracker->OpenDeltaReaders(new_redo_blocks, 
&new_redo_stores, REDO));
> Can you add a TODO here that these OpenDeltaReaders calls could actually ha
Done


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.h@164
PS3, Line 164: in-memory delta stores and then persists them to disk
> I think this is kind of unclear.
Done


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@307
PS3, Line 307:   if (start_it != end_it) {
> is this necessary? erase() with an empty range should be safe
Done


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@719
PS3, Line 719:   // In case the flush fails, remove the DeltaMemStore from the 
store list.
> I dont think this is safe. Another update could have landed in the new 'dms
Trying to determine whether best-effort StoppedTablet checks are sufficient 
here to prevent us from returning bad data.

And I guess a bigger question is whether this could lead us down a darker path 
of trying to do something that relies on the dropped
dms.


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@720
PS3, Line 720:   auto cleanup_redo_delta_stores = MakeScopedCleanup([&] {
> nit: can use the new SCOPED_CLEANUP macro here
Mm I'm not sure about that, since it needs to be canceled and needs a name?


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@757
PS3, Line 757:   // TODO(unknown): wherever we write stuff, we should write to 
a tmp path and
> I think this can just be removed now, it's super ancient (goes back to http
Done


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/diskrowset.cc@575
PS3, Line 575:   // revert changes in case of an error.
> not sure this is safe in the general case, again because as soon as we swap
Hrm I've been waffling back and forth between whether it'd be sufficient to 
assert that the tablet's been stopped here. I think it might take even more 
CheckNotStopped calls in many places, not just the DiskRowSet/DeltaTracker.

I guess the big question is whether it's OK to have the {DeltaTracker, 
DiskRowSet, RowSetMetadata} be out of sync with each other (or if this is OK, 
rolling back changes. I think this might actually be more questionable), 
provided Stopped tablet checks are best-effort.

Per your suggestion, another solution might be to sidestep this whole issue 
with a IsGTest() workaround or somesuch.



--
To view, visit http://gerrit.cloudera.org:8080/8441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141
Gerrit-Change-Number: 8441
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 07 Nov 2017 21:28:02 +0000
Gerrit-HasComments: Yes

Reply via email to