[ 
https://issues.apache.org/jira/browse/KUDU-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849077#comment-17849077
 ] 

Alexey Serbin commented on KUDU-3406:
-------------------------------------

A note: it's also necessary to pick up 
[ae7b08c00|https://github.com/apache/kudu/commit/ae7b08c006167da1ebb0c4302e5d6d7aa739a862]
 -- it contains the fix for the threshold MiBytes-to-bytes conversion.

> CompactRowSetsOp can allocate much more memory than specified by the hard 
> memory limit
> --------------------------------------------------------------------------------------
>
>                 Key: KUDU-3406
>                 URL: https://issues.apache.org/jira/browse/KUDU-3406
>             Project: Kudu
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0, 
> 1.15.0, 1.16.0
>            Reporter: Alexey Serbin
>            Assignee: Ashwani Raina
>            Priority: Critical
>              Labels: compaction, stability
>             Fix For: 1.17.0
>
>         Attachments: 270.svg, 283.svg, 296.svg, 308.svg, 332.svg, 344.svg, 
> fs_list.before
>
>
> In some scenarios, rowsets can accumulate a lot of data, so {{kudu-master}} 
> and {{kudu-tserver}} processes grow far beyond the hard memory limit 
> (controlled by the {{\-\-memory_limit_hard_bytes}} flag) when running 
> CompactRowSetsOp.  In some cases, a Kudu server process consumes all the 
> available memory, so that the OS might invoke the OOM killer.
> At this point I'm not yet sure about the exact versions affected, and what 
> leads to accumulating so much data in flushed rowsets, but I know that 1.13, 
> 1.14, 1.15 and 1.16 are  affected.  It's also not clear whether the actual 
> regression is in allowing the flushed rowsets growing that big.
> There is a reproduction scenario for this bug with {{kudu-master}} using the 
> real data from the fields.  With that data, {{kudu fs list}} reveals a rowset 
> with many UNDOs: see the attached {{fs_list.before}} file.  When starting 
> {{kudu-master}} with the data, the process memory usage eventually peaked 
> with about 25GByte of RSS while running CompactRowSetsOp, and then the RSS 
> eventually subsides down to about 200MByte once the CompactRowSetsOp is 
> completed.
> I also attached several SVG files generated by the TCMalloc's pprof from the 
> memory profile snapshots output by {{kudu-master}} when configured to dump 
> allocation stats every 512 MBytes.  I generated the SVG reports for profiles 
> attributed to the highest memory usage:
> {noformat}
> Dumping heap profile to /opt/tmp/master/nn1/profile.0270.heap (24573 MB 
> currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0283.heap (64594 MB 
> allocated cumulatively, 13221 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0296.heap (77908 MB 
> allocated cumulatively, 12110 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0308.heap (90197 MB 
> allocated cumulatively, 12406 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0332.heap (114775 MB 
> allocated cumulatively, 23884 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0344.heap (127064 MB 
> allocated cumulatively, 12648 MB currently in use)
> {noformat}
> The report from the compaction doesn't look like anything extraordinary 
> (except for the duration):
> {noformat}
> I20221012 10:45:49.684247 101750 maintenance_manager.cc:603] P 
> 68dbea0ec022440d9fc282099a8656cb: 
> CompactRowSetsOp(00000000000000000000000000000000) complete. Timing: real 
> 522.617s     user 471.783s   sys 46.588s Metrics: 
> {"bytes_written":1665145,"cfile_cache_hit":846,"cfile_cache_hit_bytes":14723646,"cfile_cache_miss":1786556,"cfile_cache_miss_bytes":4065589152,"cfile_init":7,"delta_iterators_relevant":1558,"dirs.queue_time_us":220086,"dirs.run_cpu_time_us":89219,"dirs.run_wall_time_us":89163,"drs_written":1,"fdatasync":15,"fdatasync_us":150709,"lbm_read_time_us":11120726,"lbm_reads_1-10_ms":1,"lbm_reads_lt_1ms":1786583,"lbm_write_time_us":14120016,"lbm_writes_1-10_ms":3,"lbm_writes_lt_1ms":894069,"mutex_wait_us":108,"num_input_rowsets":5,"rows_written":4043,"spinlock_wait_cycles":14720,"thread_start_us":741,"threads_started":9,"wal-append.queue_time_us":307}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to