Hello Tidy Bot, Andrew Wong, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15677

to look at the new patch set (#2).

Change subject: tserver: add locking around an RPC's usage of a Scanner
......................................................................

tserver: add locking around an RPC's usage of a Scanner

Prior to this patch, there was little to prevent multiple RPCs from
trying to access and work with a single Scanner object other than the
call sequence ID checks. In practice, those checks were fairly likely to
prevent races -- there was only a very small window of time in between
looking up a scanner and checking its sequence ID -- but it was buggy
nonetheless.

In addition, the lack of reasonable locking meant that some variables
like scanner::num_rows_returned_ were needlessly atomic. In certain
workloads this actually took a measurable amount of CPU.

This patch adds a new test that would reliably crash the tserver and
trigger TSAN warnings prior to the fix. It adds some explicit locking
around the Scanner object and asserts that most methods on the scanner
require the lock to be held.

I benchmarked this by running perf-stat on a tserver and 'kudu perf
table_scan' to count the rows in a table 200 times (ie scan with no
columns projected). This is a best case for seeing the value of the
optimization.

Before:
 Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir 
/tmp/ts':

         11,753.59 msec task-clock                #    1.166 CPUs utilized
             7,174      context-switches          #    0.610 K/sec
               807      cpu-migrations            #    0.069 K/sec
            14,909      page-faults               #    0.001 M/sec
    32,447,480,644      cycles                    #    2.761 GHz
    49,275,463,523      instructions              #    1.52  insn per cycle
     7,570,941,653      branches                  #  644.139 M/sec
        14,271,190      branch-misses             #    0.19% of all branches

      10.082020598 seconds time elapsed

      11.561886000 seconds user
       0.340585000 seconds sys

After:
 Performance counter stats for './build/latest/bin/kudu.opt tserver run 
-fs-wal-dir /tmp/ts':

          9,426.45 msec task-clock                #    1.010 CPUs utilized
             6,906      context-switches          #    0.733 K/sec
               892      cpu-migrations            #    0.095 K/sec
            15,078      page-faults               #    0.002 M/sec
    26,127,343,920      cycles                    #    2.772 GHz
    48,101,748,066      instructions              #    1.84  insn per cycle
     7,402,811,470      branches                  #  785.323 M/sec
        13,857,599      branch-misses             #    0.19% of all branches

       9.335786317 seconds time elapsed

       9.258446000 seconds user
       0.315547000 seconds sys

(1.24x fewer cycles for the same work)

When I actually scan one column of data, the improvement is only about
1% since other CPU consumption is dominant.

Change-Id: I3591e85a07aefadaf7fb05768109c8a261a8828e
---
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_service.cc
5 files changed, 271 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/15677/2
--
To view, visit http://gerrit.cloudera.org:8080/15677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3591e85a07aefadaf7fb05768109c8a261a8828e
Gerrit-Change-Number: 15677
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Andrew Wong <andrew.w...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

Reply via email to