[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187879#comment-17187879 ] ASF subversion and git services commented on KUDU-1587: --- Commit ee3bb83575a051c2feade1f8c159b2902a7160d5 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ee3bb83 ] KUDU-1587 part 2: reject write ops if apply queue is overloaded This patch implements control admission for write requests in tablet servers based on the load status of their apply queue. With this change, the recently introduced OpApplyQueueTest.ApplyQueueBackpressure scenario successfully passes. If the queue times of the tasks in the apply queue become higher than the specified threshold, the apply queue enters overloaded state. When the queue is overloaded, the tablet server rejects incoming write requests with some probability. The longer the queue stays overloaded, the greater the probability of rejections. The apply queue exits the overloaded state when queue times drop below the specified threshold. This new behavior is not yet enabled by default, keeping the legacy behavior of unbounded/uncontrolled queue times as is. To enable it, set --tablet_apply_pool_overload_threshold_ms to something greater than 0 (e.g., 500). Change-Id: I6d7688d6fa832e606b8efc4549568fa52dfa1931 Reviewed-on: http://gerrit.cloudera.org:8080/16343 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, > 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, > 1.11.1 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186284#comment-17186284 ] ASF subversion and git services commented on KUDU-1587: --- Commit c6d438ab417009e8007a1de274178d0bcf0dfb63 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=c6d438a ] [tserver] add test to reproduce KUDU-1587 conditions Added a test to reproduce conditions described in KUDU-1587. As of now, the test is disabled: it will be enabled once KUDU-1587 is addressed. Change-Id: I515a1b26152680ee9b9361afcf84fec39b8f962d Reviewed-on: http://gerrit.cloudera.org:8080/16312 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186285#comment-17186285 ] ASF subversion and git services commented on KUDU-1587: --- Commit fc8615c37eb4e28f3cc6bea0fcd5a8732451e883 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=fc8615c ] KUDU-1587 part 1: load meter for ThreadPool This patch introduces a load meter for ThreadPool, aiming to use active queue management techniques (AQM) such as CoDel [1] in scenarios where thread pool queue load metrics are applicable (e.g., KUDU-1587). [1] https://en.wikipedia.org/wiki/CoDel Change-Id: I640716dc32f193e68361ca623ee7b9271e661d8b Reviewed-on: http://gerrit.cloudera.org:8080/16332 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186286#comment-17186286 ] ASF subversion and git services commented on KUDU-1587: --- Commit fc8615c37eb4e28f3cc6bea0fcd5a8732451e883 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=fc8615c ] KUDU-1587 part 1: load meter for ThreadPool This patch introduces a load meter for ThreadPool, aiming to use active queue management techniques (AQM) such as CoDel [1] in scenarios where thread pool queue load metrics are applicable (e.g., KUDU-1587). [1] https://en.wikipedia.org/wiki/CoDel Change-Id: I640716dc32f193e68361ca623ee7b9271e661d8b Reviewed-on: http://gerrit.cloudera.org:8080/16332 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186283#comment-17186283 ] ASF subversion and git services commented on KUDU-1587: --- Commit c6d438ab417009e8007a1de274178d0bcf0dfb63 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=c6d438a ] [tserver] add test to reproduce KUDU-1587 conditions Added a test to reproduce conditions described in KUDU-1587. As of now, the test is disabled: it will be enabled once KUDU-1587 is addressed. Change-Id: I515a1b26152680ee9b9361afcf84fec39b8f962d Reviewed-on: http://gerrit.cloudera.org:8080/16312 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186282#comment-17186282 ] ASF subversion and git services commented on KUDU-1587: --- Commit c6d438ab417009e8007a1de274178d0bcf0dfb63 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=c6d438a ] [tserver] add test to reproduce KUDU-1587 conditions Added a test to reproduce conditions described in KUDU-1587. As of now, the test is disabled: it will be enabled once KUDU-1587 is addressed. Change-Id: I515a1b26152680ee9b9361afcf84fec39b8f962d Reviewed-on: http://gerrit.cloudera.org:8080/16312 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178138#comment-17178138 ] Alexey Serbin commented on KUDU-1587: - I implemented the requested functionality with the following changelists: * [a test scenario to simulate apply queue "overload"|https://gerrit.cloudera.org/#/c/16312/] * [tracking the state of the apply queue|https://gerrit.cloudera.org/#/c/16332/] * [controlling the admission of write requests with CoDel-like approach|http://gerrit.cloudera.org:8080/16343] > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Critical > Labels: roadmap-candidate > Attachments: graph.png, queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1587) Memory-based backpressure is insufficient on seek-bound workloads
[ https://issues.apache.org/jira/browse/KUDU-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448127#comment-15448127 ] Todd Lipcon commented on KUDU-1587: --- The issue here is that the only backpressure on new operations being submitted to the apply queue is based on the TransactionTracker memory limits. Each operation here is only about 1KB of inserted data (or less) and it results in a half second of worker wall time. With the default 64MB tracker limit (per tablet), that's enough space for 64000x500ms = 32000 seconds worth of work. Assume 10x parallelism due to many disks, but then recall that the limiting is per-tablet and we have 10+ active tablets, which gets us back into the 30,000-40,000 seconds worth of available queueing before backpressure kicks in. > Memory-based backpressure is insufficient on seek-bound workloads > - > > Key: KUDU-1587 > URL: https://issues.apache.org/jira/browse/KUDU-1587 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 0.10.0 >Reporter: Todd Lipcon >Priority: Critical > Attachments: queue-time.png > > > I pushed a uniform random insert workload from a bunch of clients to the > point that the vast majority of bloom filters no longer fit in buffer cache, > and the compaction had fallen way behind. Thus, every inserted row turns into > 40+ seeks (due to non-compact data) and takes 400-500ms. In this kind of > workload, the current backpressure (based on memory usage) is insufficient to > prevent ridiculously long queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)