[ 
https://issues.apache.org/jira/browse/KUDU-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617244#comment-17617244
 ] 

Ashwani Raina commented on KUDU-3406:
-------------------------------------

I did some tryout using the real data.
In my case, sometimes memory usage peaked to even 38GB while running 
DiskRowSetCompaction mostly.
While kudu-master was running, I also collected some stack traces at the 
frequency of 10 per second.

For most of them, MaintenanceMgr is found to be in DoMergeCompactionOrFlush() 
roughly around the peak memory usage time. 
There are other stack traces as well around the peak time:
Here are some of the traces:
+++

stacks.103:TID 2987608(MaintenanceMgr ):
stacks.103-    @     0x55d0f0aa5337  
kudu::tablet::DeltaPreparer<>::CollectMutations()
stacks.103-    @     0x55d0f0a97439  
kudu::tablet::DeltaFileIterator<>::CollectMutations()
stacks.103-    @     0x55d0f0ad2f74  
kudu::tablet::DeltaIteratorMerger::CollectMutations()
stacks.103-    @     0x55d0f0a20f6b  kudu::tablet::(anonymous 
namespace)::DiskRowSetCompactionInput::PrepareBlock()
stacks.103-    @     0x55d0f0a1c9fd  kudu::tablet::(anonymous 
namespace)::MergeCompactionInput::ProcessEmptyInputs()
stacks.103-    @     0x55d0f0a1cf75  kudu::tablet::(anonymous 
namespace)::MergeCompactionInput::FinishBlock()
stacks.103-    @     0x55d0f0a216cf  kudu::tablet::FlushCompactionInput()
stacks.103-    @     0x55d0f09d347a  
kudu::tablet::Tablet::DoMergeCompactionOrFlush()

---

stacks.130:TID 2987608(MaintenanceMgr ):
stacks.130-    @     0x55d0f0a21057  kudu::tablet::(anonymous 
namespace)::DiskRowSetCompactionInput::PrepareBlock()
stacks.130-    @     0x55d0f0a1c9fd  kudu::tablet::(anonymous 
namespace)::MergeCompactionInput::ProcessEmptyInputs()
stacks.130-    @     0x55d0f0a1cf75  kudu::tablet::(anonymous 
namespace)::MergeCompactionInput::FinishBlock()
stacks.130-    @     0x55d0f0a216cf  kudu::tablet::FlushCompactionInput()
stacks.130-    @     0x55d0f09d347a  
kudu::tablet::Tablet::DoMergeCompactionOrFlush()
stacks.130-    @     0x55d0f09d7435  kudu::tablet::Tablet::Compact()
stacks.130-    @     0x55d0f09efe3d  kudu::tablet::CompactRowSetsOp::Perform()
stacks.130-    @     0x55d0f2b410cd  kudu::MaintenanceManager::LaunchOp()
stacks.130-    @     0x55d0f2ba91f0  kudu::ThreadPool::DispatchThread()

---

stacks.137:TID 2987608(MaintenanceMgr ):
stacks.137-    @     0x55d0f0a1e110  kudu::tablet::RemoveAncientUndos()
stacks.137-    @     0x55d0f0a21580  kudu::tablet::FlushCompactionInput()
stacks.137-    @     0x55d0f09d347a  
kudu::tablet::Tablet::DoMergeCompactionOrFlush()
stacks.137-    @     0x55d0f09d7435  kudu::tablet::Tablet::Compact()
stacks.137-    @     0x55d0f09efe3d  kudu::tablet::CompactRowSetsOp::Perform()
stacks.137-    @     0x55d0f2b410cd  kudu::MaintenanceManager::LaunchOp()

+++

Next steps:
 # Look into MemRowSet flush code and compaction policy.
 # Try to generate similar scenario with local data generation which is a mix 
of table create/drop, frequent updates from tablet servers to master.

 

 

> CompactRowSetsOp can allocate much more memory than specified by the hard 
> memory limit
> --------------------------------------------------------------------------------------
>
>                 Key: KUDU-3406
>                 URL: https://issues.apache.org/jira/browse/KUDU-3406
>             Project: Kudu
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.13.0, 1.14.0, 1.15.0, 1.16.0
>            Reporter: Alexey Serbin
>            Assignee: Ashwani Raina
>            Priority: Critical
>              Labels: compaction, stability
>         Attachments: 270.svg, 283.svg, 296.svg, 308.svg, 332.svg, 344.svg, 
> fs_list.before
>
>
> In some scenarios, rowsets can accumulate a lot of data, so {{kudu-master}} 
> and {{kudu-tserver}} processes grow far beyond the hard memory limit 
> (controlled by the {{\-\-memory_limit_hard_bytes}} flag) when running 
> CompactRowSetsOp.  In some cases, a Kudu server process consumes all the 
> available memory, so that the OS might invoke the OOM killer.
> At this point I'm not yet sure about the exact versions affected, and what 
> leads to accumulating so much data in flushed rowsets, but I know that 1.13, 
> 1.14, 1.15 and 1.16 are  affected.  It's also not clear whether the actual 
> regression is in allowing the flushed rowsets growing that big.
> There is a reproduction scenario for this bug with {{kudu-master}} using the 
> real data from the fields.  With that data, {{kudu fs list}} reveals a rowset 
> with many UNDOs: see the attached {{fs_list.before}} file.  When starting 
> {{kudu-master}} with the data, the process memory usage eventually peaked 
> with about 25GByte of RSS while running CompactRowSetsOp, and then the RSS 
> eventually subsides down to about 200MByte once the CompactRowSetsOp is 
> completed.
> I also attached several SVG files generated by the TCMalloc's pprof from the 
> memory profile snapshots output by {{kudu-master}} when configured to dump 
> allocation stats every 512 MBytes.  I generated the SVG reports for profiles 
> attributed to the highest memory usage:
> {noformat}
> Dumping heap profile to /opt/tmp/master/nn1/profile.0270.heap (24573 MB 
> currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0283.heap (64594 MB 
> allocated cumulatively, 13221 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0296.heap (77908 MB 
> allocated cumulatively, 12110 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0308.heap (90197 MB 
> allocated cumulatively, 12406 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0332.heap (114775 MB 
> allocated cumulatively, 23884 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0344.heap (127064 MB 
> allocated cumulatively, 12648 MB currently in use)
> {noformat}
> The report from the compaction doesn't look like anything extraordinary 
> (except for the duration):
> {noformat}
> I20221012 10:45:49.684247 101750 maintenance_manager.cc:603] P 
> 68dbea0ec022440d9fc282099a8656cb: 
> CompactRowSetsOp(00000000000000000000000000000000) complete. Timing: real 
> 522.617s     user 471.783s   sys 46.588s Metrics: 
> {"bytes_written":1665145,"cfile_cache_hit":846,"cfile_cache_hit_bytes":14723646,"cfile_cache_miss":1786556,"cfile_cache_miss_bytes":4065589152,"cfile_init":7,"delta_iterators_relevant":1558,"dirs.queue_time_us":220086,"dirs.run_cpu_time_us":89219,"dirs.run_wall_time_us":89163,"drs_written":1,"fdatasync":15,"fdatasync_us":150709,"lbm_read_time_us":11120726,"lbm_reads_1-10_ms":1,"lbm_reads_lt_1ms":1786583,"lbm_write_time_us":14120016,"lbm_writes_1-10_ms":3,"lbm_writes_lt_1ms":894069,"mutex_wait_us":108,"num_input_rowsets":5,"rows_written":4043,"spinlock_wait_cycles":14720,"thread_start_us":741,"threads_started":9,"wal-append.queue_time_us":307}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to