[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16189 ) Change subject: client: allocate InFlightOp from a Batcher Arena .. client: allocate InFlightOp from a Batcher Arena InFlightOps are always associated with a single Batcher, and Batchers are always associated with a single "flush". Given that, we can allocate them from a Batcher-local Arena instead of from the heap. This improves CPU consumption and throughput of the client by about 10%. Before: INSERT report rows total: 4000 time total: 10302.3 ms time per row: 0.000257559 ms Dropping auto-created table 'default.loadgen_auto_0334edaa55fe4fdba242b0b4fcc5d360' Performance counter stats for './build/latest/bin/kudu perf loadgen localhost -num_rows_per_thread=500 -num_threads=8': 100857.18 msec task-clock#9.738 CPUs utilized 124584 context-switches #0.001 M/sec 6047 cpu-migrations#0.060 K/sec 31199 page-faults #0.309 K/sec 412192743252 cycles#4.087 GHz (83.24%) 95984615757 stalled-cycles-frontend # 23.29% frontend cycles idle (83.35%) 80498019077 stalled-cycles-backend# 19.53% backend cycles idle (83.22%) 225757843097 instructions #0.55 insn per cycle #0.43 stalled cycles per insn (83.38%) 43262879676 branches # 428.952 M/sec (83.37%) 276103070 branch-misses #0.64% of all branches (83.43%) 10.356736781 seconds time elapsed 98.263563000 seconds user 2.646264000 seconds sys After: INSERT report rows total: 4000 time total: 9730.09 ms time per row: 0.000243252 ms Dropping auto-created table 'default.loadgen_auto_ffc0ea72dd064db7b3fa12b037454eb2' Performance counter stats for './build/latest/bin/kudu.client-opt perf loadgen localhost -num_rows_per_thread=500 -num_threads=8': 91317.10 msec task-clock#9.327 CPUs utilized 57090 context-switches #0.625 K/sec 6661 cpu-migrations#0.073 K/sec 34568 page-faults #0.379 K/sec 374170608484 cycles#4.097 GHz (83.31%) 82165215635 stalled-cycles-frontend # 21.96% frontend cycles idle (83.35%) 57601295335 stalled-cycles-backend# 15.39% backend cycles idle (83.25%) 218776486065 instructions #0.58 insn per cycle #0.38 stalled cycles per insn (83.35%) 41762336591 branches # 457.333 M/sec (83.34%) 140528380 branch-misses #0.34% of all branches (83.40%) 9.790174460 seconds time elapsed 89.079083000 seconds user 2.276474000 seconds sys Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Reviewed-on: http://gerrit.cloudera.org:8080/16189 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong --- M src/kudu/client/batcher.cc M src/kudu/client/batcher.h 2 files changed, 10 insertions(+), 9 deletions(-) Approvals: Kudu Jenkins: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16189 ) Change subject: client: allocate InFlightOp from a Batcher Arena .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16189/1/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/16189/1/src/kudu/client/batcher.cc@379 PS1, Line 379: int op_prefetch = i + kPrefetchDistance * 2; : int row_prefetch = i + kPrefetchDistance; : if (ifo_prefetch >= 0 && ifo_prefetch < size) { : __builtin_prefetch(ops_[ifo_prefetch], 0, PREFETCH_HINT_T0); : } : if (op_prefetch >= 0 && op_prefetch < size) { : const auto* op = ops_[op_prefetch]->write_op.get(); : if (op) { : __builtin_prefetch(>row().isset_bitmap_, 0, PREFETCH_HINT_T0); : } : } : if (row_prefetch >= 0 && row_prefetch < size) { : const auto* op = ops_[row_prefetch]->write_op.get(); : if (op) { : __builtin_prefetch(op->row().isset_bitmap_, 0, PREFETCH_HINT_T0); : } : } > yea, and even within the same arena, they're allocated in the order the ops Makes sense. Thanks for clarifying! -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 14 Jul 2020 06:12:13 + Gerrit-HasComments: Yes
[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Hello Alexey Serbin, Andrew Wong, Kudu Jenkins, Andrew Wong, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16189 to look at the new patch set (#2). Change subject: client: allocate InFlightOp from a Batcher Arena .. client: allocate InFlightOp from a Batcher Arena InFlightOps are always associated with a single Batcher, and Batchers are always associated with a single "flush". Given that, we can allocate them from a Batcher-local Arena instead of from the heap. This improves CPU consumption and throughput of the client by about 10%. Before: INSERT report rows total: 4000 time total: 10302.3 ms time per row: 0.000257559 ms Dropping auto-created table 'default.loadgen_auto_0334edaa55fe4fdba242b0b4fcc5d360' Performance counter stats for './build/latest/bin/kudu perf loadgen localhost -num_rows_per_thread=500 -num_threads=8': 100857.18 msec task-clock#9.738 CPUs utilized 124584 context-switches #0.001 M/sec 6047 cpu-migrations#0.060 K/sec 31199 page-faults #0.309 K/sec 412192743252 cycles#4.087 GHz (83.24%) 95984615757 stalled-cycles-frontend # 23.29% frontend cycles idle (83.35%) 80498019077 stalled-cycles-backend# 19.53% backend cycles idle (83.22%) 225757843097 instructions #0.55 insn per cycle #0.43 stalled cycles per insn (83.38%) 43262879676 branches # 428.952 M/sec (83.37%) 276103070 branch-misses #0.64% of all branches (83.43%) 10.356736781 seconds time elapsed 98.263563000 seconds user 2.646264000 seconds sys After: INSERT report rows total: 4000 time total: 9730.09 ms time per row: 0.000243252 ms Dropping auto-created table 'default.loadgen_auto_ffc0ea72dd064db7b3fa12b037454eb2' Performance counter stats for './build/latest/bin/kudu.client-opt perf loadgen localhost -num_rows_per_thread=500 -num_threads=8': 91317.10 msec task-clock#9.327 CPUs utilized 57090 context-switches #0.625 K/sec 6661 cpu-migrations#0.073 K/sec 34568 page-faults #0.379 K/sec 374170608484 cycles#4.097 GHz (83.31%) 82165215635 stalled-cycles-frontend # 21.96% frontend cycles idle (83.35%) 57601295335 stalled-cycles-backend# 15.39% backend cycles idle (83.25%) 218776486065 instructions #0.58 insn per cycle #0.38 stalled cycles per insn (83.35%) 41762336591 branches # 457.333 M/sec (83.34%) 140528380 branch-misses #0.34% of all branches (83.40%) 9.790174460 seconds time elapsed 89.079083000 seconds user 2.276474000 seconds sys Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 --- M src/kudu/client/batcher.cc M src/kudu/client/batcher.h 2 files changed, 10 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/16189/2 -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/16189 ) Change subject: client: allocate InFlightOp from a Batcher Arena .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16189/1/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/16189/1/src/kudu/client/batcher.cc@379 PS1, Line 379: int ifo_prefetch = i + kPrefetchDistance * 3; : int op_prefetch = i + kPrefetchDistance * 2; : int row_prefetch = i + kPrefetchDistance; : if (ifo_prefetch >= 0 && ifo_prefetch < size) { : __builtin_prefetch(ops_[ifo_prefetch], 0, PREFETCH_HINT_T0); : } : if (op_prefetch >= 0 && op_prefetch < size) { : const auto* op = ops_[op_prefetch]->write_op.get(); : if (op) { : __builtin_prefetch(>row().isset_bitmap_, 0, PREFETCH_HINT_T0); : } : } : if (row_prefetch >= 0 && row_prefetch < size) { : const auto* op = ops_[row_prefetch]->write_op.get(); : if (op) { : __builtin_prefetch(op->row().isset_bitmap_, 0, PREFETCH_HINT_T0); : } > Just making sure I understand this bit: while ops_ are all allocated in the yea, and even within the same arena, they're allocated in the order the ops are submitted, but in the case that the ops are strewn about more than one tablet due to partitioning, the ops associated with a single WriteRpc will not necessarily be entirely contiguous/sequential in this loop, so the prefetching is probably still effective/useful. -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 14 Jul 2020 03:10:09 + Gerrit-HasComments: Yes
[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16189 ) Change subject: client: allocate InFlightOp from a Batcher Arena .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16189/1/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: http://gerrit.cloudera.org:8080/#/c/16189/1/src/kudu/client/batcher.cc@379 PS1, Line 379: int ifo_prefetch = i + kPrefetchDistance * 3; : int op_prefetch = i + kPrefetchDistance * 2; : int row_prefetch = i + kPrefetchDistance; : if (ifo_prefetch >= 0 && ifo_prefetch < size) { : __builtin_prefetch(ops_[ifo_prefetch], 0, PREFETCH_HINT_T0); : } : if (op_prefetch >= 0 && op_prefetch < size) { : const auto* op = ops_[op_prefetch]->write_op.get(); : if (op) { : __builtin_prefetch(>row().isset_bitmap_, 0, PREFETCH_HINT_T0); : } : } : if (row_prefetch >= 0 && row_prefetch < size) { : const auto* op = ops_[row_prefetch]->write_op.get(); : if (op) { : __builtin_prefetch(op->row().isset_bitmap_, 0, PREFETCH_HINT_T0); : } Just making sure I understand this bit: while ops_ are all allocated in the arena, the underlying KuduWriteOps may still strewn strewn about since they're allocated by clients, which makes this optimization still valuable, right? -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 14 Jul 2020 01:15:08 + Gerrit-HasComments: Yes
[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16189 ) Change subject: client: allocate InFlightOp from a Batcher Arena .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 14 Jul 2020 00:28:58 + Gerrit-HasComments: No
[kudu-CR] client: allocate InFlightOp from a Batcher Arena
Hello Alexey Serbin, Andrew Wong, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/16189 to review the following change. Change subject: client: allocate InFlightOp from a Batcher Arena .. client: allocate InFlightOp from a Batcher Arena InFlightOps are always associated with a single Batcher, and Batchers are always associated with a single "flush". Given that, we can allocate them from a Batcher-local Arena instead of from the heap. This improves CPU consumption and throughput of the client by about 10%. Before: INSERT report rows total: 4000 time total: 10302.3 ms time per row: 0.000257559 ms Dropping auto-created table 'default.loadgen_auto_0334edaa55fe4fdba242b0b4fcc5d360' Performance counter stats for './build/latest/bin/kudu perf loadgen localhost -num_rows_per_thread=500 -num_threads=8': 100857.18 msec task-clock#9.738 CPUs utilized 124584 context-switches #0.001 M/sec 6047 cpu-migrations#0.060 K/sec 31199 page-faults #0.309 K/sec 412192743252 cycles#4.087 GHz (83.24%) 95984615757 stalled-cycles-frontend # 23.29% frontend cycles idle (83.35%) 80498019077 stalled-cycles-backend# 19.53% backend cycles idle (83.22%) 225757843097 instructions #0.55 insn per cycle #0.43 stalled cycles per insn (83.38%) 43262879676 branches # 428.952 M/sec (83.37%) 276103070 branch-misses #0.64% of all branches (83.43%) 10.356736781 seconds time elapsed 98.263563000 seconds user 2.646264000 seconds sys After: INSERT report rows total: 4000 time total: 9730.09 ms time per row: 0.000243252 ms Dropping auto-created table 'default.loadgen_auto_ffc0ea72dd064db7b3fa12b037454eb2' Performance counter stats for './build/latest/bin/kudu.client-opt perf loadgen localhost -num_rows_per_thread=500 -num_threads=8': 91317.10 msec task-clock#9.327 CPUs utilized 57090 context-switches #0.625 K/sec 6661 cpu-migrations#0.073 K/sec 34568 page-faults #0.379 K/sec 374170608484 cycles#4.097 GHz (83.31%) 82165215635 stalled-cycles-frontend # 21.96% frontend cycles idle (83.35%) 57601295335 stalled-cycles-backend# 15.39% backend cycles idle (83.25%) 218776486065 instructions #0.58 insn per cycle #0.38 stalled cycles per insn (83.35%) 41762336591 branches # 457.333 M/sec (83.34%) 140528380 branch-misses #0.34% of all branches (83.40%) 9.790174460 seconds time elapsed 89.079083000 seconds user 2.276474000 seconds sys Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 --- M src/kudu/client/batcher.cc M src/kudu/client/batcher.h 2 files changed, 10 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/16189/1 -- To view, visit http://gerrit.cloudera.org:8080/16189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I446a8d21253b7a274872bff6d3e76705ac95d0d5 Gerrit-Change-Number: 16189 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong