Mike Percy has posted comments on this change. Change subject: KUDU-236 (part 1). Implement tablet history GC ......................................................................
Patch Set 19: (22 comments) http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction-test.cc File src/kudu/tablet/compaction-test.cc: Line 66: return { HistoryGcOpts::GC_DISABLED, Timestamp::kInvalidTimestamp }; > is there special handling for kInvalidTimestamp? I think kInvalidTimestamp Replaced with Timestamp(0) Line 217: > nit: no need for blank lines Done Line 387: vector<rowid_t> gced_input_rows; > unused Done Line 585: HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, Timestamp(0) }; > use NoHistoryGC()? Done Line 639: HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, Timestamp(0) }; > same Done http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 638: DVLOG(2) << "Ancient history mark: " << history_gc_opts.ancient_history_mark.ToString() > given this is a per-row log message, i think DVLOG(3) or DVLOG(4) is probab Done Line 647: prev_undo->set_next(nullptr); > hrm, the comment on line 629-631 is no longer correct, then. (we were const Oh, good point. Hmm. I factored this into its own function and added a call to it at call sites that also call ApplyMutationsAndGenerateUndos() prior to passing the row into ApplyMutationsAndGenerateUndos(). Alternatively, we could change ApplyMutationsAndGenerateUndos() to take a non-const input row. I think the approach I took is probably better, though. Line 800: rowid_t input_row_offset = -1; > is this variable necessary? seems like it's only used in the DVLOG below, b Removed Line 838: DVLOG(2) << "Output Row: " << dst_row.schema()->DebugRow(dst_row) << > bump to a higher DVLOG level since this is a big message and every row Done PS19, Line 863: DVLOG(2) << "Output Row: " << dst_row.schema()->DebugRow(dst_row) : << "; RowId: " << index_in_current_drs; > this vlog is redundant with the new one on 838-844 Bumped to DVLOG(5) since it shows the index in the current drs and we can't get that for GCed rows PS19, Line 954: history_gc_opts.enabled == HistoryGcOpts::GC_ENABLED && : mut->timestamp().ComesBefore(history_gc_opts.ancient_history_mark > this condition shows up a lot. what about adding a method like bool History added a method IsAncientHistory() Line 956: DVLOG(2) << "Skipping GCed input row: " << schema->DebugRow(row.row) > this log message isn't accurate since it could still be reinserted later. p Done http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction.h File src/kudu/tablet/compaction.h: PS19, Line 38: GC_ENABLED, : GC_DISABLED > reorder these and make GC_DISABLED have the value 0 so that if this is acci Did the class thing. Line 46: // A timestamp prior to which no history will be preserved. > add "Ignored if enabled != GC_ENABLED" Done Line 155: // 'is_history_truncated': Set to true if this row had a "reinsert after > update (it's an int, not a bool) Done http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet-test.cc File src/kudu/tablet/tablet-test.cc: PS19, Line 564: // TODO: Should we be reopening the tablet in a different way to persist the : // clock / timestamps? > does this need to be addressed here? I don't think this needs to be addressed as part of this patch. http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 704: *ancient_history_mark = HybridClock::AddPhysicalTimeToTimestamp(clock_->Now(), neg_hist_age); > slightly worried about underflow here if the user says "I want to keep all It looks to me like HybridTime has enough bits that this isn't possible, unless their clock says it's 1970 or something. If that was the case, NTP would be messed up, and I don't think Kudu would start. HybridTime uses 52 bits for the physical time in microseconds and 12 bits for the logical time. Here's the napkin math: mpercy@mpercy-T460p:~/src/kudu$ perl -e 'print((time() * 1000) << 12, "\n");' 6034973429760000 mpercy@mpercy-T460p:~/src/kudu$ perl -e 'print(2**31 * 1000, "\n");' 2147483648000 mpercy@mpercy-T460p:~/src/kudu$ perl -e 'print(((time() * 1000) << 12) - (2**31 * 1000), "\n");' 6032826847232000 http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: PS19, Line 193: // Test that no UNDO DELETE is generated after deleting a row from the MRS and : // waiting for AHM to pass, then flushing the MRS. : T > this test actually tests two separate cases. can you expand this comment to Done PS19, Line 243: represents the insert. In the UNDO : // representation, this insert looks like DELETE > I think this description is confusing. the UNDO *undoes* the insert, doesn' Done PS19, Line 425: 7, > some commentary at the top of what these special row offsets are would be n I added some comments at the top. Let me know if you think I need to rework this test to be more understandable. http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: PS19, Line 1512: prohibits with > missing word Done PS19, Line 1534: C > should be a lower case c, since it comes after a ':' in the formatted messa Done -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes