[jira] [Updated] (KUDU-1465) Large allocations for scanner result buffers harm allocator thread caching
[ https://issues.apache.org/jira/browse/KUDU-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1465: -- Target Version/s: (was: 1.5.0) > Large allocations for scanner result buffers harm allocator thread caching > -- > > Key: KUDU-1465 > URL: https://issues.apache.org/jira/browse/KUDU-1465 > Project: Kudu > Issue Type: Bug > Components: perf >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > I was looking at the performance of a random-read stress test on a 70 node > cluster and found that threads were often spending time in allocator > contention, particularly when deallocating RpcSidecar objects. After a bit of > analysis, I determined this is because we always preallocate buffers of 1MB > (the default batch size) even if the response is only going to be a single > row. Such large allocations go directly to the central freelist instead of > using thread-local caches. > As a simple test, I used the set_flag command to drop the default batch size > to 4KB, and the read throughput (reads/second) increased substantially. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1465) Large allocations for scanner result buffers harm allocator thread caching
[ https://issues.apache.org/jira/browse/KUDU-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1465: -- Target Version/s: 1.4.0 (was: 1.3.0) > Large allocations for scanner result buffers harm allocator thread caching > -- > > Key: KUDU-1465 > URL: https://issues.apache.org/jira/browse/KUDU-1465 > Project: Kudu > Issue Type: Bug > Components: perf >Affects Versions: 0.8.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > I was looking at the performance of a random-read stress test on a 70 node > cluster and found that threads were often spending time in allocator > contention, particularly when deallocating RpcSidecar objects. After a bit of > analysis, I determined this is because we always preallocate buffers of 1MB > (the default batch size) even if the response is only going to be a single > row. Such large allocations go directly to the central freelist instead of > using thread-local caches. > As a simple test, I used the set_flag command to drop the default batch size > to 4KB, and the read throughput (reads/second) increased substantially. -- This message was sent by Atlassian JIRA (v6.3.15#6346)