Todd Lipcon has posted comments on this change. Change subject: KUDU-236 (part 1). Implement tablet history GC ......................................................................
Patch Set 19: (22 comments) http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction-test.cc File src/kudu/tablet/compaction-test.cc: Line 66: return { HistoryGcOpts::GC_DISABLED, Timestamp::kInvalidTimestamp }; is there special handling for kInvalidTimestamp? I think kInvalidTimestamp compares _higher_ than all other timestamps Line 217: nit: no need for blank lines Line 387: vector<rowid_t> gced_input_rows; unused Line 585: HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, Timestamp(0) }; use NoHistoryGC()? Line 639: HistoryGcOpts history_gc_opts = { HistoryGcOpts::GC_DISABLED, Timestamp(0) }; same http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 638: DVLOG(2) << "Ancient history mark: " << history_gc_opts.ancient_history_mark.ToString() given this is a per-row log message, i think DVLOG(3) or DVLOG(4) is probably better. otherwise -v=2 on a debug build would be super noisy Line 647: prev_undo->set_next(nullptr); hrm, the comment on line 629-631 is no longer correct, then. (we were const_casting away the constness so we could create new undos which point _to_ the passed-in undos, but here you're actually modifying the passed-in undos). I'm not 100% sure this is safe (at least it's unexpected). It seems like this particular block doesn't really share much code/variables with the rest of the function. Maybe it can be broken out into a new method which takes a non-const CompactionInputRow*? Line 800: rowid_t input_row_offset = -1; is this variable necessary? seems like it's only used in the DVLOG below, but that DVLOG itself seems somewhat redundant with the longer DVLOG on line 838-844 Line 838: DVLOG(2) << "Output Row: " << dst_row.schema()->DebugRow(dst_row) << bump to a higher DVLOG level since this is a big message and every row PS19, Line 863: DVLOG(2) << "Output Row: " << dst_row.schema()->DebugRow(dst_row) : << "; RowId: " << index_in_current_drs; this vlog is redundant with the new one on 838-844 PS19, Line 954: history_gc_opts.enabled == HistoryGcOpts::GC_ENABLED && : mut->timestamp().ComesBefore(history_gc_opts.ancient_history_mark this condition shows up a lot. what about adding a method like bool HistoryGCOpts::ShouldGC(Timestamp t) {...} to encapsulate it? Line 956: DVLOG(2) << "Skipping GCed input row: " << schema->DebugRow(row.row) this log message isn't accurate since it could still be reinserted later. perhaps needs to move down to line 1039 or so? http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/compaction.h File src/kudu/tablet/compaction.h: PS19, Line 38: GC_ENABLED, : GC_DISABLED reorder these and make GC_DISABLED have the value 0 so that if this is accidentally implicitly casted to a bool, you get a reasonable result Alternatively, you could make this more of a class, with some static factory method like HistoryGcOpts::Disabled() and HistoryGcOpts::Enabled(Timestamp ahm), along with the 'ShouldGC()' that is suggested elsewhere Line 46: // A timestamp prior to which no history will be preserved. add "Ignored if enabled != GC_ENABLED" Line 155: // 'is_history_truncated': Set to true if this row had a "reinsert after update (it's an int, not a bool) http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet-test.cc File src/kudu/tablet/tablet-test.cc: PS19, Line 564: // TODO: Should we be reopening the tablet in a different way to persist the : // clock / timestamps? does this need to be addressed here? http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 704: *ancient_history_mark = HybridClock::AddPhysicalTimeToTimestamp(clock_->Now(), neg_hist_age); slightly worried about underflow here if the user says "I want to keep all history, I guess I'll set the age to max_int32". Is that plausible? Perhaps AddPhysicalTimeToTimestamp should have 'saturating' behavior? or do we have enough bits here that it's not possible? http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tablet/tablet_history_gc-test.cc File src/kudu/tablet/tablet_history_gc-test.cc: PS19, Line 193: // Test that no UNDO DELETE is generated after deleting a row from the MRS and : // waiting for AHM to pass, then flushing the MRS. : T this test actually tests two separate cases. can you expand this comment to cover both? PS19, Line 243: represents the insert. In the UNDO : // representation, this insert looks like DELETE I think this description is confusing. the UNDO *undoes* the insert, doesn't "represent it". PS19, Line 425: 7, some commentary at the top of what these special row offsets are would be nice (maybe use constants?) http://gerrit.cloudera.org:8080/#/c/3076/19/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: PS19, Line 1512: prohibits with missing word PS19, Line 1534: C should be a lower case c, since it comes after a ':' in the formatted message -- To view, visit http://gerrit.cloudera.org:8080/3076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If9833a863f118eb82be80ea56204d0d9141611c2 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes