Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8441 )

Change subject: disk failure: make various delta paths more robust to errors
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_compaction.cc@372
PS3, Line 372:   RETURN_NOT_OK(tracker->OpenDeltaReaders(new_redo_blocks, 
&new_redo_stores, REDO));
Can you add a TODO here that these OpenDeltaReaders calls could actually happen 
outside of the component_lock? it seems like these calls do some IO so it's not 
the best to be holding the DRS component lock during it since it'll block 
concurrent ops.


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.h@164
PS3, Line 164: in-memory delta stores and then persists them to disk
I think this is kind of unclear.

More precisely this updates the in-memory *list* of delta stores, and then 
persists the *metadata* to disk. As written it sounds like it's changing the 
delta stores themselves, and flushing deltafiles or somesuch


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@307
PS3, Line 307:   if (start_it != end_it) {
is this necessary? erase() with an empty range should be safe


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@719
PS3, Line 719:   // In case the flush fails, remove the DeltaMemStore from the 
store list.
I dont think this is safe. Another update could have landed in the new 'dms_', 
so the fact that you've now dropped it on the floor means we've lost a user's 
update that they might think was committed, no?


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@720
PS3, Line 720:   auto cleanup_redo_delta_stores = MakeScopedCleanup([&] {
nit: can use the new SCOPED_CLEANUP macro here


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/delta_tracker.cc@757
PS3, Line 757:   // TODO(unknown): wherever we write stuff, we should write to 
a tmp path and
I think this can just be removed now, it's super ancient (goes back to 
https://github.com/apache/kudu/commit/0e3bd37ab8645a00001758fa5de84861ac443eef 
!)


http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/diskrowset.cc
File src/kudu/tablet/diskrowset.cc:

http://gerrit.cloudera.org:8080/#/c/8441/3/src/kudu/tablet/diskrowset.cc@575
PS3, Line 575:   // revert changes in case of an error.
not sure this is safe in the general case, again because as soon as we swap in 
a new DMS, we can have operations coming in and committing with their dms_id 
set to the new DMS's id, and I'm not entirely clear on how we prevent losing 
them.

The answer of course could be that any failure immediately causes everything to 
shut down, but if you are shutting down then you also probably don't really 
need to re-flush the tablet metadata?



--
To view, visit http://gerrit.cloudera.org:8080/8441
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0241cd4206ce77ef1fb334458b091bc2092f4141
Gerrit-Change-Number: 8441
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 07 Nov 2017 01:51:42 +0000
Gerrit-HasComments: Yes

Reply via email to