[ https://issues.apache.org/jira/browse/KUDU-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashwani Raina reassigned KUDU-3406: ----------------------------------- Assignee: Ashwani Raina > CompactRowSetsOp can allocate much more memory than specified by the hard > memory limit > -------------------------------------------------------------------------------------- > > Key: KUDU-3406 > URL: https://issues.apache.org/jira/browse/KUDU-3406 > Project: Kudu > Issue Type: Bug > Components: master, tserver > Affects Versions: 1.13.0, 1.14.0, 1.15.0, 1.16.0 > Reporter: Alexey Serbin > Assignee: Ashwani Raina > Priority: Critical > Attachments: 270.svg, 283.svg, 296.svg, 308.svg, 332.svg, 344.svg, > fs_list.before > > > In some scenarios, rowsets can accumulate a lot of data, so {{kudu-master}} > and {{kudu-tserver}} processes grow far beyond the hard memory limit > (controlled by the {{\-\-memory_limit_hard_bytes}} flag) when running > CompactRowSetsOp. In some cases, a Kudu server process consumes all the > available memory, so that the OS might invoke the OOM killer. > At this point I'm not yet sure about the exact versions affected, and what > leads to accumulating so much data in flushed rowsets, but I know that 1.13, > 1.14, 1.15 and 1.16 are affected. It's also not clear whether the actual > regression is in allowing the flushed rowsets growing that big. > There is a reproduction scenario for this bug with {{kudu-master}} using the > real data from the fields. With that data, {{kudu fs list}} reveals a rowset > with many UNDOs: see the attached {{fs_list.before}} file. When starting > {{kudu-master}} with the data, the process memory usage eventually peaked > with about 25GByte of RSS while running CompactRowSetsOp. > I also attached several SVG files generated by the TCMalloc's pprof from the > memory profile snapshots output by {{kudu-master}} when configured to dump > allocation stats every 512 MBytes. I generated the SVG reports for profiles > attributed to the highest memory usage: > {noformat} > Dumping heap profile to /opt/tmp/master/nn1/profile.0270.heap (24573 MB > currently in use) > Dumping heap profile to /opt/tmp/master/nn1/profile.0283.heap (64594 MB > allocated cumulatively, 13221 MB currently in use) > Dumping heap profile to /opt/tmp/master/nn1/profile.0296.heap (77908 MB > allocated cumulatively, 12110 MB currently in use) > Dumping heap profile to /opt/tmp/master/nn1/profile.0308.heap (90197 MB > allocated cumulatively, 12406 MB currently in use) > Dumping heap profile to /opt/tmp/master/nn1/profile.0332.heap (114775 MB > allocated cumulatively, 23884 MB currently in use) > Dumping heap profile to /opt/tmp/master/nn1/profile.0344.heap (127064 MB > allocated cumulatively, 12648 MB currently in use) > {noformat} > The report from the compaction doesn't look like anything extraordinary > (except for the duration): > {noformat} > I20221012 10:45:49.684247 101750 maintenance_manager.cc:603] P > 68dbea0ec022440d9fc282099a8656cb: > CompactRowSetsOp(00000000000000000000000000000000) complete. Timing: real > 522.617s user 471.783s sys 46.588s Metrics: > {"bytes_written":1665145,"cfile_cache_hit":846,"cfile_cache_hit_bytes":14723646,"cfile_cache_miss":1786556,"cfile_cache_miss_bytes":4065589152,"cfile_init":7,"delta_iterators_relevant":1558,"dirs.queue_time_us":220086,"dirs.run_cpu_time_us":89219,"dirs.run_wall_time_us":89163,"drs_written":1,"fdatasync":15,"fdatasync_us":150709,"lbm_read_time_us":11120726,"lbm_reads_1-10_ms":1,"lbm_reads_lt_1ms":1786583,"lbm_write_time_us":14120016,"lbm_writes_1-10_ms":3,"lbm_writes_lt_1ms":894069,"mutex_wait_us":108,"num_input_rowsets":5,"rows_written":4043,"spinlock_wait_cycles":14720,"thread_start_us":741,"threads_started":9,"wal-append.queue_time_us":307} > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)