Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21076 )

Change subject: KUDU-3557 Simple way to change maximum thread cache size of 
tcmalloc.
......................................................................


Patch Set 10:

(19 comments)

> Patch Set 10: Code-Review+1
>
> Looks good to me!  The only thing to fix is the IWYU issue below:
>
> 00:52:58 >>> Fixing #includes in 
> '/home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/memory_gc-itest.cc'
> 00:52:58 @@ -18,6 +18,7 @@
> 00:52:58  #include <cstdint>
> 00:52:58  #include <functional>
> 00:52:58  #include <memory>
> 00:52:58 +#include <ostream>
> 00:52:58  #include <string>
> 00:52:58  #include <utility>
> 00:52:58  #include <vector>
> 00:52:58 IWYU would have edited 1 files on your behalf.

Thanks for all the reviews!

http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@7
PS3, Line 7: Simple way to change maximum thread cache size
> What if somebody sets the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment
I think the setting is via this flag. SetNumericProperty is a runtime function, 
it could be called anytime. I have added a warning to the flag comment.


http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@10
PS3, Line 10: Making the total thread cache size larger
            : could highly decrease the lock contention.
> Is there a way to add a small test to validate the results?  We do already
I tried adding a test that start tablet servers with different values of this 
flag. But I think we could not ensure that the test result is as expected.


http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@12
PS3, Line 12: at the
            : first time, but it could request more from unclaimed cache space
            : or steal from other thread cache each time it's doing garbag
> I guess it's all originated from the way how the GC process manages the per
I know this mechanism in tcmalloc. It actually just get 64K from 
unclaimed_cache_space_ if unclaimed_cache_space_ is > 0. The size of 
unclaimed_cache_space_ is tcmalloc_max_total_thread_cache_bytes - (actual used 
by all thread cache). We could observe the metric 
tcmalloc_current_total_thread_cache_bytes, the stealing will never happen if 
tcmalloc_current_total_thread_cache_bytes always less than 
tcmalloc_max_total_thread_cache_bytes.
Added some details to the commit message. Thanks!


http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@17
PS3, Line 17:
            : KUDU-3557 s
> Thank you for adding those graphs in KUDU-3557.  For those graphs, could yo
It's actually the metric spinlock_contention_time, I will make it clearer.


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc
File src/kudu/integration-tests/memory_gc-itest.cc:

http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@135
PS7, Line 135: TEST_F(MemoryGcITest, 
TestLockContentionInVariousThreadCacheSize) {
> Thank you for adding this test.
Thank you for the clarification about the metric 'spinlock_contention_time', I 
thought it includes the spinlock in tcmalloc. It turns out that Kudu and 
tcmalloc are using different implemention of spinlock. So actually the impact 
on decreasing lock contention in tamalloc by increasing the max total cache of 
thread caches is still not visible. But the metric spinlock_contention_time 
does show the performance currently. So I guess the flag still helpful.

I have set the flags to 1MB, 8MB and 64MB, and the test still fails in a 
certain test. So I will change the assertion to log.


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@146
PS7, Line 146: Set --max_total_thread_cache_bytes to 8MB for the second tablet 
server.
> nit: Set --max_total_thread_cache_bytes to 32MB for the second tablet serve
Done


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@152
PS7, Line 152: Set --max_total_thread_cache_bytes to 64MB for the third tablet 
server.
> nit: Set --max_total_thread_cache_bytes to 128MB for the third tablet serve
Done


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@159
PS7, Line 159: int64_t total_size = 0;
> Why to have this ASSERT_EVENTUALLY?  Isn't the setting becomes effective im
I'm not sure the tserver will be started at once. Removing this.


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@184
PS7, Line 184: estWorkload workload(cluster_.g
> nit: Write some data to be scanned later on.
These comments are from the codes above, changing those also.


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@200
PS7, Line 200: estWorkload workload(cluster_.get());
> nit: Additional memory is allocated during scan operations below.
Done


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@207
PS7, Line 207: kload.StopAndJoin();
> nit: Run the scan workload for a long time to let it allocate/deallocate a
Done


http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@213
PS7, Line 213:                       &ME
> I'm not sure this ASSERT_EVENTUALLY is needed here.
Indeed, removing it.


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@309
PS3, Line 309: 128 * 10
> Just another thought: given that Kudu servers have many threads by design a
The max total size by default is computed as follow:
    static const size_t kMaxThreadCacheSize = 4 << 20;
    static const size_t kDefaultOverallThreadCacheSize = 8u * 
kMaxThreadCacheSize;
I think it is 32MB, not sure if I am wrong. And before setting the flag, the 
default value of our cluster is 32MB.
Of course 128MB is better, I just want to keep the original value by default.
Done.


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@310
PS3, Line 310: across all thread cach
> Some words are missing?
Sorry, done.


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@311
PS3, Line 311: in tcm
> a greater value
Done


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@311
PS3, Line 311: lloc. Increasing this value helps in reducing lock contention
> Well, those requests are always frequent even in non-stressful workloads an
Done


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@312
PS3, Line 312:               "in tcmalloc for memory-intensive workloads. 
WARNING: This flag will "
> Please also add the 'experimental' tag as well.
Done


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@313
PS3, Line 313:               "cover the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES 
environment variable. "
> Does it make sense to add a validator for the flag?  It seems in the gperft
Yes, it does make sense. Done.


http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@809
PS3, Line 809:       LOG(INFO) << "enabling wall clock jump detection";
             :     }
             :     clock_.reset(new clock::HybridClock(
             :         metric_entity_, threshold_usec, std::move(im)));
             :   }
> Does it makes sense to add logging about the inability to apply the setting
Done



--
To view, visit http://gerrit.cloudera.org:8080/21076
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3cb8c6ed6a8f24c63258ae65f8b841fe74b75308
Gerrit-Change-Number: 21076
Gerrit-PatchSet: 10
Gerrit-Owner: Song Jiacheng <songjiach...@thinkingdata.cn>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <songjiach...@thinkingdata.cn>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 01 Mar 2024 06:12:24 +0000
Gerrit-HasComments: Yes

Reply via email to