[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173104#comment-17173104 ]
YifanZhang edited comment on KUDU-3180 at 8/10/20, 10:57 AM: ------------------------------------------------------------- If we lower {{-memory_pressure_percentage}}, we should also lower {{-block_cache_capacity_mb}} accordingly, then we may not make full use of the memory resources. In fact most time of a day the memory usage of our kudu server is not very high(about 50%), but there will be a lot of insert/update in one hour or two and the memory usage is significantly growing, at this time kudu did flush big MRSs/DMSs in priority but sometimes OOM still occurred, even though we have tuned {{-maintenance_manager_num_threads}} to 20. After we tuned {{-flush_threshold_secs}} to 1800(was 3600 before), we could avoid OOM occurring but I found that {{average_diskrowset_height}} of most tablets become larger, that means these tablets need to be compacted more. In general we want to prioritize flushes so we could free more memory, but also don't want to get more small DRSs. So maybe prioritize bigger MRS/DMS flushes would help. Maybe could use {{max(memory_size, time_since_last_flush }} to define perf improvement of a mem-store flush, so that both big mem-stores and long_lived mem-stores could be flushed in priority. was (Author: zhangyifan27): If we lower {{-memory_pressure_percentage}}, we should also lower {{-block_cache_capacity_mb}} accordingly, that may not make full use of the memory resources. In fact most time of a day the memory usage of our kudu server is not very high(about 50%), but there will be a lot of insert/update in one hour or two and the memory usage is significantly growing, at this time kudu did flush big MRSs/DMSs in priority but sometimes OOM still occurred, even though we have tuned {{-maintenance_manager_num_threads}} to 20. After we tuned {{-flush_threshold_secs}} to 1800(was 3600 before), we could avoid OOM occurring but I found that {{average_diskrowset_height}} of most tablets become larger, that means these tablets need to be compacted more. In general we want to prioritize flushes so we could free more memory, but also don't want to get more small DRSs. So maybe prioritize bigger MRS/DMS flushes would help. > kudu don't always prefer to flush MRS/DMS that anchor more memory > ----------------------------------------------------------------- > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Improvement > Reporter: YifanZhang > Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)