[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 8: > Without positive (statistically significant) perf results I'm not > sure it's worth doing this patch (at least the std::function struct > stuff which is somewhat "weird" looking). > > I seem to recall that I did this work originally as part of the > integration of KRPC into Impala, and Impala was suffering majorly > from the "AllocLarge" bottleneck on older versions of tcmalloc > (fixed in the versions we use today). Impala did a lot of >1M > allocations which went throuhg the AllocLarge slow path while > holding the page heap lock, so any time that the RPC threads had to > go through the central free list, they were likely to get blocked > behind some other thread holding that lock. This made the > cross-thread allocations really bad. > > Now with more modern tcmalloc, it's still preferable to avoid the > central free lists, but I don't think it's worth being too crazy > without showing a real benefit. Thank you for the information, Todd. Yep, it seems this patch doesn't add a lot of improvement with current version of gperftools, etc. I posted the latest version with benchmark results just for posterity. -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 27 May 2020 04:25:51 + Gerrit-HasComments: No
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Alexey Serbin has uploaded a new patch set (#8) to the change originally created by Todd Lipcon. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. KUDU-1865 (part 1): reduce some cross-thread allocations Per the analysis in the JIRA, each RPC caused two "cross-thread" allocations of ReactorTasks (one in the client, one in the server). These cross-thread allocations harm tcmalloc caching. ReactorTasks don't actually need to be heap-allocated -- that was only an easy mechanism to use a normal-looking "interface" paradigm. Instead, if we use a struct with some std::functions in it, and std::move() it to/from the pending tasks container, we avoid the heap allocation. More importantly, we avoid the worst kind of heap allocation which is allocated on one thread and freed on another. Below are results from various RPC benchmark tests. The improvement is minuscule (less than 1%) for synthetic tests scenarios simulating real-world concurrent workloads. Using boost::function in ReactorTask CentOS 7.5, RELEASE configuration built with gcc (GCC) 4.8.5 Without patch: GetTableLocations PRC: 29902.1 req/sec GetTableSchema RPC : 207100.0 req/sec With patch: GetTableLocations PRC: 30202.9 req/sec GetTableSchema RPC : 207674.0 req/sec Results from rpc-bench, 60 seconds runtime Mode:Sync Client threads: 16 Worker threads: 1 Server reactors: 4 Encryption: 0 Without patch: Reqs/sec: 155254 User CPU per req: 31.7217us Sys CPU per req: 51.3626us Ctx Sw. per req: 3.46345 With patch: Reqs/sec: 155015 User CPU per req: 32.9851us Sys CPU per req: 51.1958us Ctx Sw. per req: 3.48085 Using std::function in ReactorTask GetTableLocations RPC (RELEASE build with third-party CLANG, 60 seconds): Without patch: 15528.3 req/sec Withpatch: 15535.4 req/sec GetTableSchema RPC (RELEASE build with third-party CLANG, 60 seconds): Without patch: 103892 req/sec Withpatch: 102711 req/sec Results from rpc-bench, 60 seconds runtime Mode:Sync Client threads: 16 Worker threads: 1 Server reactors: 4 Encryption: 0 (RELEASE build with third-party CLANG): Without patch: Reqs/sec: 68191.4 User CPU per req: 34.8228us Sys CPU per req: 172.426us Ctx Sw. per req: 3.58111 Server reactor load histogram Count: 11980 Mean: 36.4902 Percentiles: 0% (min) = 16 25%= 26 50% (med) = 28 75%= 31 95%= 78 99%= 84 99.9% = 88 99.99% = 90 100% (max) = 91 Server reactor latency histogram Count: 13517690 Mean: 29.8539 Percentiles: 0% (min) = 0 25%= 18 50% (med) = 28 75%= 36 95%= 58 99%= 86 99.9% = 135 99.99% = 191 100% (max) = 28239 With patch: Reqs/sec: 67295.2 User CPU per req: 37.7613us Sys CPU per req: 171.141us Ctx Sw. per req: 3.59101 Server reactor load histogram Count: 11979 Mean: 36.1685 Percentiles: 0% (min) = 16 25%= 26 50% (med) = 28 75%= 32 95%= 80 99%= 87 99.9% = 91 99.99% = 93 100% (max) = 93 Server reactor latency histogram Count: 13141249 Mean: 30.3945 Percentiles: 0% (min) = 0 25%= 18 50% (med) = 29 75%= 37 95%= 58 99%= 86 99.9% = 141 99.99% = 204 100% (max) = 26220 Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 --- M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/negotiation.cc M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h 7 files changed, 184 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5905/8 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 8 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-R
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 7: Without positive (statistically significant) perf results I'm not sure it's worth doing this patch (at least the std::function struct stuff which is somewhat "weird" looking). I seem to recall that I did this work originally as part of the integration of KRPC into Impala, and Impala was suffering majorly from the "AllocLarge" bottleneck on older versions of tcmalloc (fixed in the versions we use today). Impala did a lot of >1M allocations which went throuhg the AllocLarge slow path while holding the page heap lock, so any time that the RPC threads had to go through the central free list, they were likely to get blocked behind some other thread holding that lock. This made the cross-thread allocations really bad. Now with more modern tcmalloc, it's still preferable to avoid the central free lists, but I don't think it's worth being too crazy without showing a real benefit. -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 27 May 2020 03:58:23 + Gerrit-HasComments: No
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5905/7//COMMIT_MSG Commit Message: PS7: I also replaced std::function with boost::function in PS7 and ran synthetic tests ("master-test --gtest_filter='*ConcurrentGetT ableSchemaTest.Rpc*'" and "table_locations-itest --gtest_filter='*.GetTableLocationsBenchmark") on CentOS 7.5 machine with gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28). From the results below it's clear that the improvement is minuscule (under 1%). KUDU-1865 with boost::function instead of std::function; CentOS 7.5, RELEASE configuration built with gcc (GCC) 4.8.5 Without patch: GetTableLocations PRC: 29902.1 req/sec GetTableSchema RPC : 207100.0 req/sec With patch: GetTableLocations PRC: 30202.9 req/sec GetTableSchema RPC : 207674.0 req/sec As for rpc-bench, the improvements are tiny as well: Without patch: Mode:Sync Client threads: 16 Worker threads: 1 Server reactors: 4 Encryption: 0 -- Reqs/sec: 155254 User CPU per req: 31.7217us Sys CPU per req: 51.3626us Ctx Sw. per req: 3.46345 With patch: Mode:Sync Client threads: 16 Worker threads: 1 Server reactors: 4 Encryption: 0 -- Reqs/sec: 155015 User CPU per req: 32.9851us Sys CPU per req: 51.1958us Ctx Sw. per req: 3.48085 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 26 May 2020 23:01:41 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 7: > I am curious, Did you see an improvement in some of the RPC > benchmarks as a result of this change? The improvements are minuscule and sometimes there is improvement at all in synthetic test scenarios (such as GetTableLocations and GetTableSchema concurrent RPC scenarios). I also tried to replace std::function with boost::function in ReactorTask, and that helps a bit with number of allocations per tcmalloc's allocation tracing, but not much as for the real-world benchmarks. Below are some results: GetTableLocations RPC (RELEASE build with third-party CLANG, 60 seconds): Without patch: 15528.3 req/sec Withpatch: 15535.4 req/sec GetTableSchema RPC (RELEASE build with third-party CLANG, 60 seconds): Without patch: 103892 req/sec Withpatch: 102711 req/sec Results from rpc-bench, 60 seconds runtime Mode:Sync Client threads: 16 Worker threads: 1 Server reactors: 4 Encryption: 0 (RELEASE build with third-party CLANG): Without patch: Reqs/sec: 68191.4 User CPU per req: 34.8228us Sys CPU per req: 172.426us Ctx Sw. per req: 3.58111 Server reactor load histogram Count: 11980 Mean: 36.4902 Percentiles: 0% (min) = 16 25%= 26 50% (med) = 28 75%= 31 95%= 78 99%= 84 99.9% = 88 99.99% = 90 100% (max) = 91 Server reactor latency histogram Count: 13517690 Mean: 29.8539 Percentiles: 0% (min) = 0 25%= 18 50% (med) = 28 75%= 36 95%= 58 99%= 86 99.9% = 135 99.99% = 191 100% (max) = 28239 -- With patch: Reqs/sec: 67295.2 User CPU per req: 37.7613us Sys CPU per req: 171.141us Ctx Sw. per req: 3.59101 Server reactor load histogram Count: 11979 Mean: 36.1685 Percentiles: 0% (min) = 16 25%= 26 50% (med) = 28 75%= 32 95%= 80 99%= 87 99.9% = 91 99.99% = 93 100% (max) = 93 Server reactor latency histogram Count: 13141249 Mean: 30.3945 Percentiles: 0% (min) = 0 25%= 18 50% (med) = 29 75%= 37 95%= 58 99%= 86 99.9% = 141 99.99% = 204 100% (max) = 26220 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 26 May 2020 22:31:39 + Gerrit-HasComments: No
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 7: I am curious, Did you see an improvement in some of the RPC benchmarks as a result of this change? -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 26 May 2020 21:36:28 + Gerrit-HasComments: No
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/5905/7/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: http://gerrit.cloudera.org:8080/#/c/5905/7/src/kudu/rpc/connection.cc@597 PS7, Line 597: [=](ReactorThread* /*rt*/) { : this->QueueOutbound(unique_ptr(t)); : }, : [=](const Status& s) { : t->Abort(s); : delete t; : }, Is there a possibility of a race wrt OutboundTransfer object ptr 't'? Underlying t could be deleted by Abort while Run is active and under mgmt of unique_ptr? http://gerrit.cloudera.org:8080/#/c/5905/7/src/kudu/rpc/connection.cc@596 PS7, Line 596: ReactorTask task = { : [=](ReactorThread* /*rt*/) { : this->QueueOutbound(unique_ptr(t)); : }, : [=](const Status& s) { : t->Abort(s); : delete t; : }, : }; Hmm, this is an interesting mechanism to create an anonymous class from a base class using lambdas. Is there a way to specify the function supplied/overridden? Add a comment about the function being supplied if no such mechanisms is available. http://gerrit.cloudera.org:8080/#/c/5905/7/src/kudu/rpc/connection.cc@818 PS7, Line 818: // The lambdas below need to hold a reference to the connection in case : // it has been closed before the task gets scheduled/aborted. Can you explain why that wouldn't apply above on queueing response? -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 26 May 2020 18:12:29 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Alexey Serbin has uploaded a new patch set (#7) to the change originally created by Todd Lipcon. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. KUDU-1865 (part 1): reduce some cross-thread allocations Per the analysis in the JIRA, each RPC caused two "cross-thread" allocations of ReactorTasks (one in the client, one in the server). These cross-thread allocations harm tcmalloc caching. ReactorTasks don't actually need to be heap-allocated -- that was only an easy mechanism to use a normal-looking "interface" paradigm. Instead, if we use a struct with some std::functions in it, and std::move() it to/from the pending tasks container, we avoid the heap allocation. More importantly, we avoid the worst kind of heap allocation which is allocated on one thread and freed on another. I verified that this removed the cross-thread allocation traffic using tcmalloc tracing and the script posted to the JIRA. Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 --- M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/negotiation.cc M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h 7 files changed, 179 insertions(+), 284 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5905/7 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Alexey Serbin has restored this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Restored -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: restore Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Todd Lipcon has abandoned this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Abandoned Actually it looks like a lot of this code got change, and the patch is full of conflicts, so I'm gonna abandon this for now. -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc@408 PS4, Line 408: OutboundTransfer* t = OutboundTransfer::CreateForCallResponse(slices, cb); > please leave some comment about the lifetime and ownership of 't' done. also removed the comment above which should have been removed in an earlier commit (it no longer applies to this code here) -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 Nov 2017 00:14:08 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc@408 PS4, Line 408: OutboundTransfer* t = OutboundTransfer::CreateForCallResponse(slices, cb); please leave some comment about the lifetime and ownership of 't' -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 10 Oct 2017 21:06:58 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/5905 ) Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc@a391 PS4, Line 391: > This may be a naive question but is there a reason to not keep this class i The issue with using a class is that it has to be heap-allocated rather than just being used as a value type. Using the struct with function members makes it moveable and avoids allocation. -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-Change-Number: 5905 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 09 Oct 2017 23:25:56 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Michael Ho has posted comments on this change. Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5905/4/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: PS4, Line 391: This may be a naive question but is there a reason to not keep this class instead ? -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5905 to look at the new patch set (#3). Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. KUDU-1865 (part 1): reduce some cross-thread allocations Per the analysis in the JIRA, each RPC caused two "cross-thread" allocations of ReactorTasks (one in the client, one in the server). These cross-thread allocations harm tcmalloc caching. ReactorTasks don't actually need to be heap-allocated -- that was only an easy mechanism to use a normal-looking "interface" paradigm. Instead, if we use a struct with some std::functions in it, and std::move() it to/from the pending tasks container, we avoid the heap allocation. More importantly, we avoid the worst kind of heap allocation which is allocated on one thread and freed on another. I verified that this removed the cross-thread allocation traffic using tcmalloc tracing and the script posted to the JIRA. Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 --- M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h 6 files changed, 154 insertions(+), 249 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5905/3 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5905 to look at the new patch set (#2). Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. KUDU-1865 (part 1): reduce some cross-thread allocations Per the analysis in the JIRA, each RPC caused two "cross-thread" allocations of ReactorTasks (one in the client, one in the server). These cross-thread allocations harm tcmalloc caching. ReactorTasks don't actually need to be heap-allocated -- that was only an easy mechanism to use a normal-looking "interface" paradigm. Instead, if we use a struct with some std::functions in it, and std::move() it to/from the pending tasks container, we avoid the heap allocation. More importantly, we avoid the worst kind of heap allocation which is allocated on one thread and freed on another. I verified that this removed the cross-thread allocation traffic using tcmalloc tracing and the script posted to the JIRA. Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 --- M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h 6 files changed, 152 insertions(+), 249 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5905/2 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Todd Lipcon has posted comments on this change. Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 1: nope... doesn't ring a bell. I wouldn't let those block some perf testing, and then if we see a big perf win we can come back and make the patch test clean? -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Sailesh Mukil has posted comments on this change. Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 1: > (2 comments) > > I found a couple of bugs, but there are still some memory leaks I > haven't been able to figure out. This now passes DEBUG testing. > I'll keep at it. I looked at this a bit more and it seems that the leaks are coming from ThreadRestrictions::LoadTLS() and KernelStackWatchdog::GetTLS(). Any idea how there could be leaks there, since that code hasn't changed for this patch? -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Sailesh Mukil has posted comments on this change. Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. Patch Set 1: (2 comments) I found a couple of bugs, but there are still some memory leaks I haven't been able to figure out. This now passes DEBUG testing. I'll keep at it. http://gerrit.cloudera.org:8080/#/c/5905/1/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: Line 621: DCHECK(this->reactor_thread()->reactor()->closing()); This is a use-after-free if the Connection object is shutdown and cleared from the ReactorThread::server_conns_ list. Since this new way of passing function pointers doesn't increase the refcount of the 'Connection' object, the object can get freed before this abort_func() gets called. http://gerrit.cloudera.org:8080/#/c/5905/1/src/kudu/rpc/reactor.cc File src/kudu/rpc/reactor.cc: Line 596: }; This still needs a dummy 'abort_func()', else it results in a runtime error. -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1865 (part 1): reduce some cross-thread allocations
Hello Sailesh Mukil, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/5905 to review the following change. Change subject: KUDU-1865 (part 1): reduce some cross-thread allocations .. KUDU-1865 (part 1): reduce some cross-thread allocations Per the analysis in the JIRA, each RPC caused two "cross-thread" allocations of ReactorTasks (one in the client, one in the server). These cross-thread allocations harm tcmalloc caching. ReactorTasks don't actually need to be heap-allocated -- that was only an easy mechanism to use a normal-looking "interface" paradigm. Instead, if we use a struct with some std::functions in it, and std::move() it to/from the pending tasks container, we avoid the heap allocation. More importantly, we avoid the worst kind of heap allocation which is allocated on one thread and freed on another. I verified that this removed the cross-thread allocation traffic using tcmalloc tracing and the script posted to the JIRA. Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 --- M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h 6 files changed, 148 insertions(+), 249 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/5905/1 -- To view, visit http://gerrit.cloudera.org:8080/5905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7d4d5f14fb302196b1797c712b21cfce81f157c1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Sailesh Mukil