[jira] [Updated] (KUDU-1465) Large allocations for scanner result buffers harm allocator thread caching

2018-02-16 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1465:
--
Target Version/s:   (was: 1.5.0)

> Large allocations for scanner result buffers harm allocator thread caching
> --
>
> Key: KUDU-1465
> URL: https://issues.apache.org/jira/browse/KUDU-1465
> Project: Kudu
>  Issue Type: Bug
>  Components: perf
>Affects Versions: 0.8.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> I was looking at the performance of a random-read stress test on a 70 node 
> cluster and found that threads were often spending time in allocator 
> contention, particularly when deallocating RpcSidecar objects. After a bit of 
> analysis, I determined this is because we always preallocate buffers of 1MB 
> (the default batch size) even if the response is only going to be a single 
> row. Such large allocations go directly to the central freelist instead of 
> using thread-local caches.
> As a simple test, I used the set_flag command to drop the default batch size 
> to 4KB, and the read throughput (reads/second) increased substantially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1465) Large allocations for scanner result buffers harm allocator thread caching

2017-03-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-1465:
--
Target Version/s: 1.4.0  (was: 1.3.0)

> Large allocations for scanner result buffers harm allocator thread caching
> --
>
> Key: KUDU-1465
> URL: https://issues.apache.org/jira/browse/KUDU-1465
> Project: Kudu
>  Issue Type: Bug
>  Components: perf
>Affects Versions: 0.8.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> I was looking at the performance of a random-read stress test on a 70 node 
> cluster and found that threads were often spending time in allocator 
> contention, particularly when deallocating RpcSidecar objects. After a bit of 
> analysis, I determined this is because we always preallocate buffers of 1MB 
> (the default batch size) even if the response is only going to be a single 
> row. Such large allocations go directly to the central freelist instead of 
> using thread-local caches.
> As a simple test, I used the set_flag command to drop the default batch size 
> to 4KB, and the read throughput (reads/second) increased substantially.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)