Hello Tidy Bot, Kurt Deschler, Ashwani Raina, Yingchun Lai, Yifan Zhang, Attila Bukor, Kudu Jenkins, Abhishek Chennaka,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19281 to look at the new patch set (#9). Change subject: KUDU-3406 memory budgeting for CompactRowSetsOp ...................................................................... KUDU-3406 memory budgeting for CompactRowSetsOp This patch implements memory budgeting for performing rowset merge compactions (i.e. CompactRowSetsOp maintenance operations). The idea is to check whether it's enough memory left before reaching the hard memory limit if starting a CompactRowSetsOp. An estimate for the amount of memory necessary to perform the operation is based on the total on-disk size of all deltas in rowsets selected for the merge compaction and the ratio of memory-to-disk size when loading those deltas in memory to perform the merge rowset compaction. If there is enough memory, then a rowset is considered as an input for merge compaction, otherwise it's not. Meanwhile, REDO deltas are becoming UNDO deltas after major delta compactions run on the rowset, and UNDO deltas eventually become ancient, so UndoDeltaBlockGCOp drop those. With that, the amount of memory required to load a rowset's delta data into memory shrinks over long run, and eventually it's back into the input for one of the future runs of the CompactRowSetsOp maintenance operation. Prior to this patch, the root cause of running out of memory when performing CompactRowSetsOp was trying to allocate too much memory at least due to the following factors: * many UNDO deltas might accumulate in rowsets selected for the compaction operation because of the relatively high setting for the --tablet_history_max_age_sec flag (7 days) and a particular workload that issues many updates for rows in the same rowset * even if it's a merge-like operation by its nature, the current implementation of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas at once, and it keeps all the preliminary results in the memory as well before persisting the result data to disk * the current implementation of CompactRowSetsOp loads all the UNDO deltas from the rowsets selected for compaction regardless whether they are ancient or not; it discards of the data sourced from the ancient deltas in the very end before persisting the result data Ideally, the current implementation of CompactRowSetsOp should be refactored to merge the deltas in participating rowsets sequentially, chunk by chunk, persisting the results and allocating memory just for small bunch of processed deltas, not loading all the deltas at once. A future patch should take care of that, while this patch provides an interim approach using memory budgeting on top of the current CompactRowSetsOp implementation as-is. The newly introduced behavior is gated by the following two flags: * rowset_compaction_memory_estimate_enabled: whether to enable memory budgeting for CompactRowSetsOp (default is 'false'). * rowset_compaction_ancient_delta_threshold_enabled: whether to check against the ratio of ancient UNDO deltas across rowsets selected for compaction (default is 'true'). In addition, the following flags allow for tweaking the new behavior gated by the corresponding flags above: * rowset_compaction_delta_memory_factor: the multiplication factor for the total size of rowset's deltas to estimate how much memory CompactRowSetsOp would consume if operating on those deltas when no runtime stats for the compact_rs_mem_usage_to_deltas_size_ratio metric is yet available (default is 5.0) * rowset_compaction_ancient_delta_max_ratio: the threshold for the ratio of the data size in ancient UNDO deltas to the total data size of UNDO deltas in the rowsets selected for merge compaction * rowset_compaction_estimate_min_deltas_size_mb: the threshold on the total size of a rowset's deltas to apply the memory budgeting To complement the --rowset_compaction_delta_memory_factor flag with more tablet-specific stats, two new per-tablet metrics have been introduced: * compact_rs_mem_usage is a histogram to gather statistics on how much memory rowset merge compaction consumed * compact_rs_mem_usage_to_deltas_size_ratio is a histogram to track the memory-to-disk size for a tablet's rowsets participating in merge compaction -- this metric provides the average that's is used as a more precise factor to estimate the amount of memory a rowset's deltas would use when undergoing merge compaction given the amount of memory of all the rowset's deltas on disk This patch doesn't add a test, but I verified how the new functionality works with real data from the case when merge rowset compaction would take about 28GByte if not constrained by the memory limit. I'm planning to add a test in a follow-up changelist based on the following patch once the latter appears in the git repository: https://gerrit.cloudera.org/#/c/19278 Change-Id: I89c171284944831e95c45a993d85fbefe89048cf --- M src/kudu/tablet/compaction.cc M src/kudu/tablet/compaction.h M src/kudu/tablet/compaction_policy-test.cc M src/kudu/tablet/compaction_policy.cc M src/kudu/tablet/compaction_policy.h M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/rowset_info.cc M src/kudu/tablet/rowset_info.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_metrics.cc M src/kudu/tablet/tablet_metrics.h M src/kudu/tablet/tablet_mm_ops-test.cc M src/kudu/tablet/tablet_mm_ops.cc M src/kudu/tablet/tablet_mm_ops.h 21 files changed, 451 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/19281/9 -- To view, visit http://gerrit.cloudera.org:8080/19281 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I89c171284944831e95c45a993d85fbefe89048cf Gerrit-Change-Number: 19281 Gerrit-PatchSet: 9 Gerrit-Owner: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Ashwani Raina <ara...@cloudera.com> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yifan Zhang <chinazhangyi...@163.com> Gerrit-Reviewer: Yingchun Lai <acelyc1112...@gmail.com>