Todd Lipcon has posted comments on this change. Change subject: KUDU-237 (part 2) - Add support for REINSERT deltas in delta files ......................................................................
Patch Set 7: (12 comments) A little nervous about the interaction of this with tablet history GC. Can you check out tablet_history_gc-itest which has a TODO about enabling reinserts? http://gerrit.cloudera.org:8080/#/c/4819/7/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 657: Mutation* redo_head = nullptr; I think this function would be a little easier to follow if we renamed this variable to 'redo_delete' rather than 'redo_head', since it clarifies that it is always either nullptr (if the row is existing) or a single deletion mutation (if the row has been deleted). Then, some of the extra assertions down below aren't really necessary, since it's easy by code inspection to see that the only place we assign to this variable, we're assigning a deletion or null. PS7, Line 709: // In the case where the previous undo was a nullptr just make this one : // the head. this comment could be rephrased now that the if() below has been inverted PS7, Line 711: NULL nit: nullptr PS7, Line 721: Reset the REDO head, which must have contained a DELETE REDO. I think a little more explanation here is in order. If I understand correctly, the logic is as follows: - if we see a REINSERT REDO, then that must have followed an earlier DELETE REDO. -- therefore, that DELETE REDO must have generated a 'DELETE' into our output 'redo_head' I think instead of "reset the REDO head" we should say "clear the REDO deletion" or explicitly say "reset the REDO deletion to null" An example might help clarify the comment as well: in row: "foo" in redo: delete @ 10 in redo: reinsert @ 20 "bar" result: out undo: reinsert @10 "foo" out undo: delete @20 out row: "bar" PS7, Line 722: passing an undo_encoder that encodes the state of : // the row prior to to the REINSERT. this makes it sound like there is some actual row state captured, but in fact it's just creating a deletion at the same timestamp as the reinsert Line 725: // will have the timestamp of the DELETE REDO. isn't this the undo encoder that is generated by the above? (this might dovetail with a suggestion I made in the patch above this one in the patch series) PS7, Line 735: Mutation* delete_redo = redo_head; : RowChangeListDecoder decoder(delete_redo->changelist()); : CHECK_OK(decoder.Init()); : DCHECK(decoder.is_delete()) << "Redo HEAD was not a delete: " : << delete_redo->changelist().ToString(*base_schema) << " " : << ERROR_LOG_CONTEXT; maybe stick this in a {} block so that it's more obvious that the variables in this scope can't leak out? PS7, Line 746: reinsert I think this is creating a _delete_ undo, no? the inverse of a REINSERT is a DELETE. I would have expected here to be applying 'redo_head' (the delete) in order to yield a REINSERT Line 751: if (history_gc_opts.IsAncientHistory(redo_mut->timestamp())) { isn't it possible that only one of the resulting undos is GCable? PS7, Line 756: REINSERT same Line 772: if (undo_head != NULL) { undo_head is always non-null here, no? PS7, Line 780: redo_head = Mutation::CreateInArena(arena, : redo_mut->timestamp(), : undo_encoder.as_changelist()); should we have a CHECK here that redo_head == nullptr prior to this assignment? -- To view, visit http://gerrit.cloudera.org:8080/4819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d421b8d2981a80c4485223b9b27a38d95e866a7 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes