Todd Lipcon has posted comments on this change.

Change subject: KUDU-237 (part 2) - Add support for REINSERT deltas in delta 
files
......................................................................


Patch Set 7:

(12 comments)

A little nervous about the interaction of this with tablet history GC. Can you 
check out tablet_history_gc-itest which has a TODO about enabling reinserts?

http://gerrit.cloudera.org:8080/#/c/4819/7/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 657:   Mutation* redo_head = nullptr;
I think this function would be a little easier to follow if we renamed this 
variable to 'redo_delete' rather than 'redo_head', since it clarifies that it 
is always either nullptr (if the row is existing) or a single deletion mutation 
(if the row has been deleted).

Then, some of the extra assertions down below aren't really necessary, since 
it's easy by code inspection to see that the only place we assign to this 
variable, we're assigning a deletion or null.


PS7, Line 709: // In the case where the previous undo was a nullptr just make 
this one
             :       // the head.
this comment could be rephrased now that the if() below has been inverted


PS7, Line 711: NULL
nit: nullptr


PS7, Line 721: Reset the REDO head, which must have contained a DELETE REDO.
I think a little more explanation here is in order. If I understand correctly, 
the logic is as follows:

- if we see a REINSERT REDO, then that must have followed an earlier DELETE 
REDO.
-- therefore, that DELETE REDO must have generated a 'DELETE' into our output 
'redo_head'

I think instead of "reset the REDO head" we should say "clear the REDO 
deletion" or explicitly say "reset the REDO deletion to null"


An example might help clarify the comment as well:

in row: "foo"
in redo: delete @ 10
in redo: reinsert @ 20 "bar"

result:

out undo: reinsert @10 "foo"
out undo: delete @20
out row: "bar"


PS7, Line 722: passing an undo_encoder that encodes the state of
             :       //     the row prior to to the REINSERT.
this makes it sound like there is some actual row state captured, but in fact 
it's just creating a deletion at the same timestamp as the reinsert


Line 725:       //     will have the timestamp of the DELETE REDO.
isn't this the undo encoder that is generated by the above? (this might 
dovetail with a suggestion I made in the patch above this one in the patch 
series)


PS7, Line 735:         Mutation* delete_redo = redo_head;
             :         RowChangeListDecoder decoder(delete_redo->changelist());
             :         CHECK_OK(decoder.Init());
             :         DCHECK(decoder.is_delete()) << "Redo HEAD was not a 
delete: "
             :                                     << 
delete_redo->changelist().ToString(*base_schema) << " "
             :                                     << ERROR_LOG_CONTEXT;
maybe stick this in a {} block so that it's more obvious that the variables in 
this scope can't leak out?


PS7, Line 746: reinsert
I think this is creating a _delete_ undo, no? the inverse of a REINSERT is a 
DELETE. I would have expected here to be applying 'redo_head' (the delete) in 
order to yield a REINSERT


Line 751:         if (history_gc_opts.IsAncientHistory(redo_mut->timestamp())) {
isn't it possible that only one of the resulting undos is GCable?


PS7, Line 756: REINSERT
same


Line 772:         if (undo_head != NULL) {
undo_head is always non-null here, no?


PS7, Line 780: redo_head = Mutation::CreateInArena(arena,
             :                                             
redo_mut->timestamp(),
             :                                             
undo_encoder.as_changelist());
should we have a CHECK here that redo_head == nullptr prior to this assignment?


-- 
To view, visit http://gerrit.cloudera.org:8080/4819
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9d421b8d2981a80c4485223b9b27a38d95e866a7
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to