[ 
https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646952#comment-17646952
 ] 

Yingchun Lai edited comment on KUDU-3400 at 12/14/22 7:19 AM:
--------------------------------------------------------------

[~aserbin] 
{quote}When generating heap profile, it's important to use proper location for 
the toolchain and binary. If the binaries were built with devtoolset, it's 
necessary to set proper environment when running {{pprof}} from gperftools (see 
the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script).
{quote}
Thanks for your reminding. Do you mean it's needed to run the script before 
running  {{pprof}}  ?

Similar to  KUDU-3406, the tserver is in memory presure and flush op taking 
priority over delta compaction over and over again.

I suspect https://issues.apache.org/jira/browse/KUDU-3197 is related to the 
issue too if {{pprof }}is not properly used, both of them say thay "Schema" 
cost too much memory. After upgrading the cluster to a version including this 
patch ([https://gerrit.cloudera.org/c/18255/),] this situation hasn't 
reproduced after about 1 month.


was (Author: laiyingchun):
[~aserbin] 
{quote}When generating heap profile, it's important to use proper location for 
the toolchain and binary. If the binaries were built with devtoolset, it's 
necessary to set proper environment when running {{pprof}} from gperftools (see 
the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script).
{quote}
Do you mean it's needed to run the script before running  {{pprof}}  ?

> CompilationManager::RequestRowProjector consumed too much memory
> ----------------------------------------------------------------
>
>                 Key: KUDU-3400
>                 URL: https://issues.apache.org/jira/browse/KUDU-3400
>             Project: Kudu
>          Issue Type: Bug
>          Components: codegen
>    Affects Versions: 1.12.0
>            Reporter: Yingchun Lai
>            Priority: Major
>         Attachments: data02heap.svg, heapprofile.svg, pstack.txt
>
>
> In one of our cluster, we find that CompilationManager::RequestRowProjector 
> function consumed too much memory accidentally. Some situaction of this 
> cluster:
>  # some tables have more than 1000 columns, so the table schema may be very 
> costly to copy
>  # sometimes the tservers have memory pressure, and then do flush operations 
> more frequently (to try to reduce memory consumed by MRS/DMS)
> I catched a heap profile on a tserver, found out that 
> CompilationManager::RequestRowProjector cost most memory when Schema copied, 
> the source code:
>  
> {code:java}
> CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache,
>                 CodeGenerator* generator)
>   : base_(base),
>     proj_(proj),
>     cache_(cache),
>     generator_(generator) {} {code}
> That is to say, Schemas (i.e. base and proj) are copied when construct 
> CompilationTask objects.
> The heap profile says that Schema consumed about 50GB memory, that really 
> shock me, even though the Schema is large, but how can it consumed 50GB 
> memory? I forget to `pstack` the process when it happend, maybe there are 
> hundreds of thousands of CompilationManager::RequestRowProjector calls that 
> time, but according to the code logic, it should not hang there for a long 
> time?
> {code:java}
> if (!cached) {
>   shared_ptr<CompilationTask> task(make_shared<CompilationTask>(
>       *base_schema, *projection, &cache_, &generator_));
>   WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }),
>                   "RowProjector compilation request submit failed", 10);
>   return false;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to