Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )
Change subject: eKUDU-3407 Avoid unchecked scheduling of flush operations. ...................................................................... Patch Set 35: (5 comments) Thanks for the review~ http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20166/29//COMMIT_MSG@12 PS29, Line 12: eventually break : due to OOM > Right -- overall performance is dropping, so every operation takes longer t Theoretically, yes, it should not be much higher than 80%. But I could always find a tserver whose memory is higher than 90, sometimes 100%. I had looked into the memory profile and I think everything is OK except some big SchemaPBs from tombstoned replicas, which I have raised another issue KUDU-3486 to fix it. But the memory of tombstoned replicas is also tracked by the memory tracker, so the memory pressure mechanism should still work. I think the reason might be the scan requests, or the concurrent writes or I/O problem. And all of them could be optimized by maintenance operations. And also, sometimes the performance might not be the root cause of OOM. I still think this mechanism is needed since I always find that there are many operations pending with high perf improvement score when the tserver is under memory pressure. http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@584 PS34, Line 584: me > nit: fix the alignment for this line Done http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1022 PS34, Line 1022: l > What's "e"? This looks like an incomplete word or a typo. Done http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1023 PS34, Line 1023: s * pr > nit: greater Done http://gerrit.cloudera.org:8080/#/c/20166/34/src/kudu/util/maintenance_manager-test.cc@1026 PS34, Line 1026: // take time, so the other_ops_running_times might be greater than expected. : const int64_t memory_op_running_times = op2.run_count(); > These margins make sense from the theoretical perspective, but how did you I ran it with the parameters and it turns out that AssertEventually will time out until memory_op run 1000 times. No other problem except that, the range is big enough to avoid the flakiness. Done. -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 35 Gerrit-Owner: Song Jiacheng <songjiach...@thinkingdata.cn> Gerrit-Reviewer: Abhishek Chennaka <achenn...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Ashwani Raina <ara...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng <songjiach...@thinkingdata.cn> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Tue, 14 Nov 2023 17:19:27 +0000 Gerrit-HasComments: Yes