[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388492#comment-17388492 ] Yingchun Lai commented on KUDU-1954: Although we have tried to reduce a single compaction operation's duration, it is still possible in some special environments compaction OPs run slower than data ingestion. In some environments, the machines may have only spinning disks, or even a single spinning disk, the --maintenance_manager_num_threads is set to 1, once the thread is lauching some heavy compaction OPs, flush OPs will wait a long time to be lauched. I think we can introduce a seperate flush threads to do flush OPs specially, which is similar to how RocksDB works[1]. 1. https://github.com/facebook/rocksdb/blob/4361d6d16380f619833d58225183cbfbb2c7a1dd/include/rocksdb/options.h#L599-L658 > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: compaction, perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > Labels: performance, roadmap-candidate, scalability > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265069#comment-17265069 ] Alexey Serbin commented on KUDU-1954: - One related patch to fix high contention in MM: https://github.com/apache/kudu/commit/9e4664d44ca994484d79d970e7c7e929d0dba055 > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: compaction, perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > Labels: performance, roadmap-candidate, scalability > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113352#comment-17113352 ] Todd Lipcon commented on KUDU-1954: --- The incremental compaction design in Kudu ensures that any given compaction only reads <128MB of data (given the default budget configuration, which I wouldn't recommend changing). Do you have the logs showing a compaction that takes significantly longer than 10-20 seconds? Maybe we need to optimize some code paths there if you're seeing any lengthy compactions. > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113291#comment-17113291 ] wangningito commented on KUDU-1954: --- Hey Todd, I'm wondering if it is possible to split the flush ops thread out from maintenace_manager_num_threads, I sometimes found the memrowset memories exceed the flush_threshold_mb a lot but the maintenance threads were all hold by the long compaction. If the tablet is very big, the compaction may takes minutes. Maybe flush could keeps working in order to reduce the memory usage. > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367930#comment-16367930 ] Todd Lipcon commented on KUDU-1954: --- I think most of this has been improved in the last year: bq. we don't schedule flushes until we are already in "backpressure" realm, so we spent most of our time doing backpressure - KUDU-1949 changed this to start triggering flushes at 60%, and backpressure only starts at 80%. bq. even if we configure N maintenance threads, we typically are only using ~50% of those threads due to the scheduling granularity 40aa4c3c271c9df20a17a1d353ce582ee3fda742 (in 1.4.0) changed the MM to immediately schedule new work when a thread frees up. bq. when we do hit the "memory-pressure flush" threshold, all threads quickly switch to flushing, which then brings us far beneath the threshold bq. long running compactions can temporarily starve flushes bq. high volume of writes can starve compactions These three are not yet addressed, though various improvements to flush/compact performance make long-running ones less common. > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936844#comment-15936844 ] Todd Lipcon commented on KUDU-1954: --- Started writing various notes here: https://docs.google.com/document/d/17-2CcmrjxZY0Gd9wDUh83xCCNL574famw8_2Bhfu-_8/edit?usp=sharing > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian JIRA (v6.3.15#6346)