[kudu-CR] KUDU-636. Use protobuf arenas for CommitMsgs

2020-07-09 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16147 )

Change subject: KUDU-636. Use protobuf arenas for CommitMsgs
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I78698d4cb4944bddd8dabd6cbbf1e3a064056199
Gerrit-Change-Number: 16147
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 09 Jul 2020 20:48:07 +
Gerrit-HasComments: No


[kudu-CR] KUDU-636. Use protobuf arenas for CommitMsgs

2020-07-09 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16147 )

Change subject: KUDU-636. Use protobuf arenas for CommitMsgs
..

KUDU-636. Use protobuf arenas for CommitMsgs

This commit optimizes the write path by allowing protobuf arenas to be
used to construct the operation results protobuf and the CommitMsg that
contains it. The operation result for a large batch of writes has one or
more embedded protobuf per inserted row, so using a protobuf arena for
allocation is much more efficient than calling into the system allocator
for each object.

In order to accomplish this, I had to simplify the Log interface a bit.
Previously, the Log code constructed a LogEntryPB and passed that
through to the log's appender thread, even though it had already
performed all of the serialization on the submitting thread. Doing that
required that the log entry retain references to all of the embedded
protobufs, which complicated lifetime quite a bit.

The new Log interface performs all of the serialization and analysis
(including extracting the OpIds of the replicate messages in the batch)
inline in the submission path instead of doing any such work on the
Append thread. With this, the interface now just takes a const protobuf
reference instead of a unique_ptr, which means that the
caller has a simpler model around its lifetime.

With the above accomplished, it was straightforward to add a protobuf
Arena to the OpState structure and allocate the CommitMsg and its
constituent sub-messages from that Arena.

The performance benefits are substantial. I benchmarked on a local
machine using:

  $ kudu perf loadgen localhost -num_rows_per_thread=1000 -num_threads=8

and ran the tserver under `perf stat` to collect counters:

Without patch:

  INSERT report
  rows total: 8000
  time total: 35860.9 ms
time per row: 0.000448261 ms

   Performance counter stats for './build/latest/bin/kudu tserver run 
-fs-wal-dir /tmp/ts':

   378784.92 msec task-clock#8.453 CPUs utilized
 1429039  context-switches  #0.004 M/sec
  132930  cpu-migrations#0.351 K/sec
 3128091  page-faults   #0.008 M/sec
   1553122880821  cycles#4.100 GHz  
(83.24%)
313764365792  stalled-cycles-frontend   #   20.20% frontend cycles 
idle (83.33%)
166769392663  stalled-cycles-backend#   10.74% backend cycles 
idle  (83.39%)
943534760864  instructions  #0.61  insn per cycle
#0.33  stalled cycles 
per insn  (83.34%)
170465210875  branches  #  450.032 M/sec
(83.39%)
   834101556  branch-misses #0.49% of all branches  
(83.32%)

   357.520042000 seconds user
21.770448000 seconds sys

With patch:

  INSERT report
  rows total: 8000
  time total: 32701 ms
time per row: 0.000408763 ms

   Performance counter stats for './build/latest/bin/kudu tserver run 
-fs-wal-dir /tmp/ts':

   272393.27 msec task-clock#4.915 CPUs utilized
  300768  context-switches  #0.001 M/sec
   44879  cpu-migrations#0.165 K/sec
 2861143  page-faults   #0.011 M/sec
   1126891932279  cycles#4.137 GHz  
(83.28%)
209167186469  stalled-cycles-frontend   #   18.56% frontend cycles 
idle (83.42%)
144156173079  stalled-cycles-backend#   12.79% backend cycles 
idle  (83.34%)
925439690437  instructions  #0.82  insn per cycle
#0.23  stalled cycles 
per insn  (83.28%)
163672508297  branches  #  600.868 M/sec
(83.33%)
   655509045  branch-misses #0.40% of all branches  
(83.34%)

   257.52199 seconds user
15.112482000 seconds sys

Summary:
* 9.6% throughput increase
* 39% reduction in tserver cycles

Change-Id: I78698d4cb4944bddd8dabd6cbbf1e3a064056199
Reviewed-on: http://gerrit.cloudera.org:8080/16147
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/tablet/local_tablet_writer.h
M src/kudu/tablet/ops/alter_schema_op.cc
M src/kudu/tablet/ops/alter_schema_op.h
M 

[kudu-CR] KUDU-636. Use protobuf arenas for CommitMsgs

2020-07-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16147 )

Change subject: KUDU-636. Use protobuf arenas for CommitMsgs
..


Patch Set 3:

also worth noting I rebased this away from the GetTableLocations patch on the 
master, since it's an orthogonal optimization


--
To view, visit http://gerrit.cloudera.org:8080/16147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I78698d4cb4944bddd8dabd6cbbf1e3a064056199
Gerrit-Change-Number: 16147
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 09 Jul 2020 19:42:26 +
Gerrit-HasComments: No


[kudu-CR] KUDU-636. Use protobuf arenas for CommitMsgs

2020-07-09 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16147 )

Change subject: KUDU-636. Use protobuf arenas for CommitMsgs
..


Patch Set 3:

(2 comments)

think I fixed the issues but we'll see what the tests have to say

http://gerrit.cloudera.org:8080/#/c/16147/2/src/kudu/consensus/log.h
File src/kudu/consensus/log.h:

http://gerrit.cloudera.org:8080/#/c/16147/2/src/kudu/consensus/log.h@591
PS2, Line 591:
> What if replicate_op_ids_ is empty?  Maybe, add a CHECK() on this pre-condi
turns out this function was never even called, so i removed it


http://gerrit.cloudera.org:8080/#/c/16147/2/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/16147/2/src/kudu/tablet/ops/write_op.cc@243
PS2, Line 243:
> nit: trailing space
Done



--
To view, visit http://gerrit.cloudera.org:8080/16147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I78698d4cb4944bddd8dabd6cbbf1e3a064056199
Gerrit-Change-Number: 16147
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 09 Jul 2020 19:42:04 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-636. Use protobuf arenas for CommitMsgs

2020-07-09 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Grant Henke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16147

to look at the new patch set (#3).

Change subject: KUDU-636. Use protobuf arenas for CommitMsgs
..

KUDU-636. Use protobuf arenas for CommitMsgs

This commit optimizes the write path by allowing protobuf arenas to be
used to construct the operation results protobuf and the CommitMsg that
contains it. The operation result for a large batch of writes has one or
more embedded protobuf per inserted row, so using a protobuf arena for
allocation is much more efficient than calling into the system allocator
for each object.

In order to accomplish this, I had to simplify the Log interface a bit.
Previously, the Log code constructed a LogEntryPB and passed that
through to the log's appender thread, even though it had already
performed all of the serialization on the submitting thread. Doing that
required that the log entry retain references to all of the embedded
protobufs, which complicated lifetime quite a bit.

The new Log interface performs all of the serialization and analysis
(including extracting the OpIds of the replicate messages in the batch)
inline in the submission path instead of doing any such work on the
Append thread. With this, the interface now just takes a const protobuf
reference instead of a unique_ptr, which means that the
caller has a simpler model around its lifetime.

With the above accomplished, it was straightforward to add a protobuf
Arena to the OpState structure and allocate the CommitMsg and its
constituent sub-messages from that Arena.

The performance benefits are substantial. I benchmarked on a local
machine using:

  $ kudu perf loadgen localhost -num_rows_per_thread=1000 -num_threads=8

and ran the tserver under `perf stat` to collect counters:

Without patch:

  INSERT report
  rows total: 8000
  time total: 35860.9 ms
time per row: 0.000448261 ms

   Performance counter stats for './build/latest/bin/kudu tserver run 
-fs-wal-dir /tmp/ts':

   378784.92 msec task-clock#8.453 CPUs utilized
 1429039  context-switches  #0.004 M/sec
  132930  cpu-migrations#0.351 K/sec
 3128091  page-faults   #0.008 M/sec
   1553122880821  cycles#4.100 GHz  
(83.24%)
313764365792  stalled-cycles-frontend   #   20.20% frontend cycles 
idle (83.33%)
166769392663  stalled-cycles-backend#   10.74% backend cycles 
idle  (83.39%)
943534760864  instructions  #0.61  insn per cycle
#0.33  stalled cycles 
per insn  (83.34%)
170465210875  branches  #  450.032 M/sec
(83.39%)
   834101556  branch-misses #0.49% of all branches  
(83.32%)

   357.520042000 seconds user
21.770448000 seconds sys

With patch:

  INSERT report
  rows total: 8000
  time total: 32701 ms
time per row: 0.000408763 ms

   Performance counter stats for './build/latest/bin/kudu tserver run 
-fs-wal-dir /tmp/ts':

   272393.27 msec task-clock#4.915 CPUs utilized
  300768  context-switches  #0.001 M/sec
   44879  cpu-migrations#0.165 K/sec
 2861143  page-faults   #0.011 M/sec
   1126891932279  cycles#4.137 GHz  
(83.28%)
209167186469  stalled-cycles-frontend   #   18.56% frontend cycles 
idle (83.42%)
144156173079  stalled-cycles-backend#   12.79% backend cycles 
idle  (83.34%)
925439690437  instructions  #0.82  insn per cycle
#0.23  stalled cycles 
per insn  (83.28%)
163672508297  branches  #  600.868 M/sec
(83.33%)
   655509045  branch-misses #0.40% of all branches  
(83.34%)

   257.52199 seconds user
15.112482000 seconds sys

Summary:
* 9.6% throughput increase
* 39% reduction in tserver cycles

Change-Id: I78698d4cb4944bddd8dabd6cbbf1e3a064056199
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/tablet/local_tablet_writer.h
M src/kudu/tablet/ops/alter_schema_op.cc
M src/kudu/tablet/ops/alter_schema_op.h
M src/kudu/tablet/ops/op.h
M