David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-237 (part 1) - Support proper mutation encoding for 
reinserts
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist-test.cc
File src/kudu/common/row_changelist-test.cc:

Line 176
> no equivalent test of an invalid reinsert delta?
added that and also extended the "normal" test quite a bit.


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist.cc
File src/kudu/common/row_changelist.cc:

Line 44:   string ret;
> move this variable down below the DELETE case
Done


Line 190:     encoder.SetType(decoder.type_);
> should this be outside the loop? what if, in the projected result, none of 
this is done inside the loop on purpose (i've made SetType() idempotent) 
precisely so that we get an empty 'buf' if we end up with an empty 'buf' if no 
columns are found. The behavior was the same before. and, in the only place 
where we call this (memrowset.cc line 601) we discard the mutation entirely if 
that is the case.

I moved the DCHECK out of the loop though.


Line 222:     undo_encoder->SetType(type_);
> shouldn't the UNDO of a REINSERT be a DELETE?
see my comment and explanation on compaction.cc. Having the UNDO be a REINSERT 
(which we would need to build anyway) allows us to go through the row only once 
(otherwise we'd need to build a REINSERT prior to running ApplyRowChanges to 
capture the old state, and then go through the row again to apply the new 
REINSERT).


Line 278:       out->SetType(decoder.type_);
> same note as above. I dont think it's a change, but the docs should probabl
I've changed the docs to reflect that the UPDATE and REINSERT cases are now the 
same

also moved the DCHECK outside the loop


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/common/row_changelist.h
File src/kudu/common/row_changelist.h:

Line 156:   // The change list includes direct and indirect data.
> can you be more explicit here that the data is copied?
Done


Line 239:     Slice val_slice;
> since this val_slice is never actually set to anything, I think it's cleare
Done


Line 309:   // state of the row into the undo_encoder.
> can you amend this doc to explain what happens with reinsert?
Done


Line 313:   // Apply this RowChangeList to row number 'row_idx' in 'dst_col', 
but only
> same
Done


Line 391:   Status DecodeNext(DecodedUpdate* update);
> should this now say REQUIRES is_update or is_reinsert?
Done


http://gerrit.cloudera.org:8080/#/c/4791/5/src/kudu/tablet/compaction.cc
File src/kudu/tablet/compaction.cc:

Line 728:         // row but disregard the undo reinsert that is created.
> why disregard instead of having ApplyRowChanges create the right undo?
ApplyRowChanges needs to capture the state of the old (now deleted) base + 
whatever updates happened prior to the delete. So "undo_encoder" will be a 
reinsert that will allow us to recover that state. 

For example say we have this schema (string a, int b) and the following row and 
mutation sequence (the number after @ is the timestamp):

DELETE@10 <- {BASE ("dummy",1)} -> UPDATE@20 (SET b = 2) -> DELETE@30 -> 
REINSERT@40 ("dummy", 3)

When we run ApplyRowChanges below 'undo_encoder' will encode: 
REINSERT("dummy",2) so that eventually we can have:

DELETE@10 <- UPDATE@20(SET b = 1) <- REINSERT@30("dummy", 2) <- DELETE@40 <- 
{BASE ("dummy",3)}

But this is only completed in a followup patch. In this patch we still build 
REINSERT@30 but then discard it to get the (still truncated history) of:

DELETE@40 <- {BASE ("dummy",3)}

The full logic in the next patch puts this in actual code.


PS5, Line 734:  undo_encoder.Reset();
             :         undo_encoder.SetToDelete();
> this should probably be done by the ApplyRowChanges itself
see my comment above.


-- 
To view, visit http://gerrit.cloudera.org:8080/4791
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b0a95e3f618bf20eb18ca4f4b03b9e2ce5704e8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to