[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646952#comment-17646952 ]
Yingchun Lai edited comment on KUDU-3400 at 12/14/22 7:19 AM: -------------------------------------------------------------- [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Thanks for your reminding. Do you mean it's needed to run the script before running {{pprof}} ? Similar to KUDU-3406, the tserver is in memory presure and flush op taking priority over delta compaction over and over again. I suspect https://issues.apache.org/jira/browse/KUDU-3197 is related to the issue too if {{pprof }}is not properly used, both of them say thay "Schema" cost too much memory. After upgrading the cluster to a version including this patch ([https://gerrit.cloudera.org/c/18255/),] this situation hasn't reproduced after about 1 month. was (Author: laiyingchun): [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running {{pprof}} ? > CompilationManager::RequestRowProjector consumed too much memory > ---------------------------------------------------------------- > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen > Affects Versions: 1.12.0 > Reporter: Yingchun Lai > Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr<CompilationTask> task(make_shared<CompilationTask>( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)