Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20716 )
Change subject: Issue a warning on parameter heap_sample_every_n_bytes. ...................................................................... Patch Set 6: (8 comments) http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@7 PS6, Line 7: Issue a warning With this patch, no warning is actually issued :) This patch just adds a warning into the description of the flag, but the description already contains information about the overhead of the allocation sampling. I guess proper summary for this patch could be: [util] add a warning into 'heap_sample_every_n_bytes' flag description http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@10 PS6, Line 10: thread : cache request What is 'thread cache request'? http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@12 PS6, Line 12: occupy nit: 'acquire' or 'hold' http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@15 PS6, Line 15: (especially for upserts) Is this essential? Any allocation on the heap will add to the lock contention, no? http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@16 PS6, Line 16: so much performance because of the lock contention of tcmalloc Even without enabling tcmalloc sampling a system with quite high throughput already shows significant lock contention in tcmalloc -- you could see it in the thread stacks traces captured in the diagnostic logs :) http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@18 PS6, Line 18: 1MB This is quite low, actually. Empirically, setting it to hundreds of gigabytes makes more sense in a single tablet server of a production system, if trying the sampling at all. http://gerrit.cloudera.org:8080/#/c/20716/6//COMMIT_MSG@20 PS6, Line 20: So give a warning on the parameter heap_sample_every_n_bytes in case : that some users do not know the terrible influence of it. Even with the extra warning, users wouldn't know how bad the lock contention in tcmalloc could be before they try it: no quantitative information is provided on that in the description of the flag, before or after this patch. If trying to add guardrails for people who don't quite understand what they are doing, maybe consider tagging this flag with the 'unsafe' attribute: it's currently tagged with just 'experimental' attribute, but adding 'unsafe' looks reasonable. In addition, consider issuing a warning into the log if the heap_sample_every_n_bytes flag is set to a positive value -- that could be done in the validator for the flag. http://gerrit.cloudera.org:8080/#/c/20716/6/src/kudu/util/flags.cc File src/kudu/util/flags.cc: http://gerrit.cloudera.org:8080/#/c/20716/6/src/kudu/util/flags.cc@100 PS6, Line 100: Warning: Setting it to a value greater than 0 will probably " : "lead to lock contention in tcmalloc, try setting it manually when a sample " : "is desired. Try "setting manually" what? how about: WARNING: setting this flag to a positive value increases lock contention in tcmalloc; the lower the value, the higher the lock contention -- To view, visit http://gerrit.cloudera.org:8080/20716 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00d55126f0cad0ac6cc130576cc45808f96f0a47 Gerrit-Change-Number: 20716 Gerrit-PatchSet: 6 Gerrit-Owner: Song Jiacheng <songjiach...@thinkingdata.cn> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <greber...@gmail.com> Gerrit-Reviewer: Song Jiacheng <songjiach...@thinkingdata.cn> Gerrit-Comment-Date: Tue, 21 Nov 2023 17:27:44 +0000 Gerrit-HasComments: Yes