[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863758#comment-15863758
 ] 

Jason Brown edited comment on CASSANDRA-8457 at 2/13/17 9:02 PM:
-----------------------------------------------------------------

OK, so I've been performance load testing the snot of this code for the last 
several weeks, with help from netty committers, flight recorder, and flame 
graphs. As a result, I've made some major and some minor tweaks, and now I'm 
slightly faster than trunk with slightly better throughput. I have some 
optimizations in my back pocket that will increase even more, but as 
[~slebresne] and I have agreed before, we'll save those for follow up tickets.

trunk
{code}
             id, type       total ops,    op/s,    pk/s,   row/s,    mean,     
med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max 
ms,  sum ms,  sdv ms,      mb
  4 threadCount, total,        233344,    3889,    3889,    3889,     1.0,     
1.0,     1.2,     1.3,     1.5,    68.2,   60.0,  0.01549,      0,      9,     
538,     538,       4,    5381
  8 threadCount, total,        544637,    9076,    9076,    9076,     0.8,     
0.8,     1.0,     1.1,     1.4,    73.8,   60.0,  0.00978,      0,     20,    
1267,    1267,       5,   11848
 16 threadCount, total,       1126627,   18774,   18774,   18774,     0.8,     
0.8,     0.9,     1.0,     5.5,    78.2,   60.0,  0.01882,      0,     40,    
2665,    2665,       6,   23666
 24 threadCount, total,       1562460,   26036,   26036,   26036,     0.9,     
0.8,     1.0,     1.1,     9.1,    81.3,   60.0,  0.00837,      0,     55,    
3543,    3543,       9,   32619
 36 threadCount, total,       2098097,   34962,   34962,   34962,     1.0,     
0.9,     1.1,     1.3,    60.9,    83.0,   60.0,  0.00793,      0,     73,    
4665,    4665,       7,   43144
 54 threadCount, total,       2741814,   45686,   45686,   45686,     1.1,     
1.0,     1.4,     1.7,    62.2,   131.7,   60.0,  0.01321,      0,     93,    
5748,    5748,       7,   55097
 81 threadCount, total,       3851131,   64166,   64166,   64166,     1.2,     
1.0,     1.6,     2.6,    62.3,   151.7,   60.0,  0.01152,      0,    159,    
8190,    8521,      14,  106805
121 threadCount, total,       4798169,   79947,   79947,   79947,     1.5,     
1.1,     2.0,     3.0,    63.5,   117.8,   60.0,  0.05689,      0,    165,    
9323,    9439,       5,   97536
181 threadCount, total,       5647043,   94088,   94088,   94088,     1.9,     
1.4,     2.6,     4.9,    68.5,   169.2,   60.0,  0.01639,      0,    195,   
10106,   11011,      11,  126422
271 threadCount, total,       6450510,  107461,  107461,  107461,     2.5,     
1.8,     3.7,    12.0,    75.4,   155.8,   60.0,  0.01542,      0,    228,   
10304,   12789,       9,  143857
406 threadCount, total,       6700764,  111635,  111635,  111635,     3.6,     
2.5,     5.3,    55.8,    75.6,   196.5,   60.0,  0.01800,      0,    243,    
9995,   13170,       7,  144166
609 threadCount, total,       7119535,  118477,  118477,  118477,     5.1,     
3.5,     7.9,    62.8,    85.1,   170.0,   60.1,  0.01775,      0,    250,   
10149,   13781,       7,  148118
913 threadCount, total,       7093347,  117981,  117981,  117981,     7.7,     
4.9,    15.7,    71.3,   101.1,   173.4,   60.1,  0.02780,      0,    246,   
10327,   13859,       8,  155896
{code}

8457
{code}
             id, type       total ops,    op/s,    pk/s,   row/s,    mean,     
med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max 
ms,  sum ms,  sdv ms,      mb
  4 threadCount, total,        161668,    2694,    2694,    2694,     1.4,     
1.4,     1.6,     1.7,     3.2,    68.2,   60.0,  0.01264,      0,      6,     
363,     363,       4,    3631
  8 threadCount, total,        498139,    8301,    8301,    8301,     0.9,     
0.9,     1.1,     1.3,     1.8,    73.5,   60.0,  0.00446,      0,     19,    
1164,    1164,       6,   11266
 16 threadCount, total,        765437,   12756,   12756,   12756,     1.2,     
1.2,     1.4,     1.5,     5.7,    74.8,   60.0,  0.01251,      0,     29,    
1819,    1819,       5,   17238
 24 threadCount, total,       1122768,   18710,   18710,   18710,     1.2,     
1.2,     1.4,     1.5,     8.5,   127.7,   60.0,  0.00871,      0,     42,    
2538,    2538,       5,   25054
 36 threadCount, total,       1649658,   27489,   27489,   27489,     1.3,     
1.2,     1.4,     1.6,    60.1,    77.7,   60.0,  0.00627,      0,     57,    
3652,    3652,       7,   33743
 54 threadCount, total,       2258999,   37641,   37641,   37641,     1.4,     
1.3,     1.6,     1.8,    62.5,    81.7,   60.0,  0.00771,      0,     79,    
4908,    4908,       6,   46789
 81 threadCount, total,       3255005,   54220,   54220,   54220,     1.5,     
1.2,     1.7,     2.2,    63.8,   133.4,   60.0,  0.02030,      0,    117,    
6953,    7008,       9,   72208
121 threadCount, total,       4643184,   77293,   77293,   77293,     1.5,     
1.2,     1.8,     2.9,    62.6,   112.7,   60.1,  0.02449,      0,    171,    
8976,    9135,       9,  101583
181 threadCount, total,       5625693,   93731,   93731,   93731,     1.9,     
1.4,     2.4,     4.8,    67.2,   208.1,   60.0,  0.02373,      0,    217,    
9675,   11585,      11,  138725
271 threadCount, total,       6213997,  103523,  103523,  103523,     2.6,     
1.8,     3.5,    27.2,    69.7,   183.1,   60.0,  0.01456,      0,    227,    
9977,   12392,       7,  137334
406 threadCount, total,       6832341,  113808,  113808,  113808,     3.5,     
2.4,     5.1,    57.4,    73.2,   179.0,   60.0,  0.01437,      0,    242,   
10100,   13373,       8,  146086
609 threadCount, total,       7272610,  121130,  121130,  121130,     5.0,     
3.4,     7.7,    62.8,    78.3,   134.9,   60.0,  0.02995,      0,    254,   
10177,   14088,       8,  152827
913 threadCount, total,       7437538,  123715,  123715,  123715,     7.3,     
4.7,    15.0,    69.9,    86.1,   252.8,   60.1,  0.01407,      0,    264,   
10316,   14669,      11,  164130
{code}
Also, [~aweisberg] has been reviewing on the side and has made some nice 
comments, as well.

Overview of changes:

- less reliance on pipeline
I've reduced the number of handlers in the netty pipeline to a bare minimum 
(that is, just one) as I've found in my testing that there is a slight cost to 
operating the netty pipeline: each handler look up the next handler, checking 
the promise's status, and so on. While this change makes the code less like a 
pipeline/chain of commands, it is still easily understandable and will perform 
better. (As an aside, I have a colleague who runs a massively scalable service, 
and they don't use any handlers in the netty pipeline whatsoever - they just 
send ByteBufs to the channel.)

- fixing flush strategy
I've had to dive into the internals of netty to understand all the subtleties 
of how flushing and thread scheduling works, and then matrix that against our 
needs. Thus, I've documented it quite thoroughly in the class-level 
documentation in {{OutboundMessageConnection}} and {{MessageOutHandler}}, and 
the code implements those details.



was (Author: jasobrown):
OK, so I've been performance load testing the snot of this code for the last 
several weeks, with help from netty committers, flight recorder, and flame 
graphs. As a result, I've made some major and some minor tweaks, and now I'm 
slightly faster than trunk with slightly better throughput. I have some 
optimizations in my back pocket that will increase even more, but as Sylvain 
has stated before, we'll save those for follow up tickets.

trunk
{code}
             id, type       total ops,    op/s,    pk/s,   row/s,    mean,     
med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max 
ms,  sum ms,  sdv ms,      mb
  4 threadCount, total,        233344,    3889,    3889,    3889,     1.0,     
1.0,     1.2,     1.3,     1.5,    68.2,   60.0,  0.01549,      0,      9,     
538,     538,       4,    5381
  8 threadCount, total,        544637,    9076,    9076,    9076,     0.8,     
0.8,     1.0,     1.1,     1.4,    73.8,   60.0,  0.00978,      0,     20,    
1267,    1267,       5,   11848
 16 threadCount, total,       1126627,   18774,   18774,   18774,     0.8,     
0.8,     0.9,     1.0,     5.5,    78.2,   60.0,  0.01882,      0,     40,    
2665,    2665,       6,   23666
 24 threadCount, total,       1562460,   26036,   26036,   26036,     0.9,     
0.8,     1.0,     1.1,     9.1,    81.3,   60.0,  0.00837,      0,     55,    
3543,    3543,       9,   32619
 36 threadCount, total,       2098097,   34962,   34962,   34962,     1.0,     
0.9,     1.1,     1.3,    60.9,    83.0,   60.0,  0.00793,      0,     73,    
4665,    4665,       7,   43144
 54 threadCount, total,       2741814,   45686,   45686,   45686,     1.1,     
1.0,     1.4,     1.7,    62.2,   131.7,   60.0,  0.01321,      0,     93,    
5748,    5748,       7,   55097
 81 threadCount, total,       3851131,   64166,   64166,   64166,     1.2,     
1.0,     1.6,     2.6,    62.3,   151.7,   60.0,  0.01152,      0,    159,    
8190,    8521,      14,  106805
121 threadCount, total,       4798169,   79947,   79947,   79947,     1.5,     
1.1,     2.0,     3.0,    63.5,   117.8,   60.0,  0.05689,      0,    165,    
9323,    9439,       5,   97536
181 threadCount, total,       5647043,   94088,   94088,   94088,     1.9,     
1.4,     2.6,     4.9,    68.5,   169.2,   60.0,  0.01639,      0,    195,   
10106,   11011,      11,  126422
271 threadCount, total,       6450510,  107461,  107461,  107461,     2.5,     
1.8,     3.7,    12.0,    75.4,   155.8,   60.0,  0.01542,      0,    228,   
10304,   12789,       9,  143857
406 threadCount, total,       6700764,  111635,  111635,  111635,     3.6,     
2.5,     5.3,    55.8,    75.6,   196.5,   60.0,  0.01800,      0,    243,    
9995,   13170,       7,  144166
609 threadCount, total,       7119535,  118477,  118477,  118477,     5.1,     
3.5,     7.9,    62.8,    85.1,   170.0,   60.1,  0.01775,      0,    250,   
10149,   13781,       7,  148118
913 threadCount, total,       7093347,  117981,  117981,  117981,     7.7,     
4.9,    15.7,    71.3,   101.1,   173.4,   60.1,  0.02780,      0,    246,   
10327,   13859,       8,  155896
{code}

8457
{code}
             id, type       total ops,    op/s,    pk/s,   row/s,    mean,     
med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max 
ms,  sum ms,  sdv ms,      mb
  4 threadCount, total,        161668,    2694,    2694,    2694,     1.4,     
1.4,     1.6,     1.7,     3.2,    68.2,   60.0,  0.01264,      0,      6,     
363,     363,       4,    3631
  8 threadCount, total,        498139,    8301,    8301,    8301,     0.9,     
0.9,     1.1,     1.3,     1.8,    73.5,   60.0,  0.00446,      0,     19,    
1164,    1164,       6,   11266
 16 threadCount, total,        765437,   12756,   12756,   12756,     1.2,     
1.2,     1.4,     1.5,     5.7,    74.8,   60.0,  0.01251,      0,     29,    
1819,    1819,       5,   17238
 24 threadCount, total,       1122768,   18710,   18710,   18710,     1.2,     
1.2,     1.4,     1.5,     8.5,   127.7,   60.0,  0.00871,      0,     42,    
2538,    2538,       5,   25054
 36 threadCount, total,       1649658,   27489,   27489,   27489,     1.3,     
1.2,     1.4,     1.6,    60.1,    77.7,   60.0,  0.00627,      0,     57,    
3652,    3652,       7,   33743
 54 threadCount, total,       2258999,   37641,   37641,   37641,     1.4,     
1.3,     1.6,     1.8,    62.5,    81.7,   60.0,  0.00771,      0,     79,    
4908,    4908,       6,   46789
 81 threadCount, total,       3255005,   54220,   54220,   54220,     1.5,     
1.2,     1.7,     2.2,    63.8,   133.4,   60.0,  0.02030,      0,    117,    
6953,    7008,       9,   72208
121 threadCount, total,       4643184,   77293,   77293,   77293,     1.5,     
1.2,     1.8,     2.9,    62.6,   112.7,   60.1,  0.02449,      0,    171,    
8976,    9135,       9,  101583
181 threadCount, total,       5625693,   93731,   93731,   93731,     1.9,     
1.4,     2.4,     4.8,    67.2,   208.1,   60.0,  0.02373,      0,    217,    
9675,   11585,      11,  138725
271 threadCount, total,       6213997,  103523,  103523,  103523,     2.6,     
1.8,     3.5,    27.2,    69.7,   183.1,   60.0,  0.01456,      0,    227,    
9977,   12392,       7,  137334
406 threadCount, total,       6832341,  113808,  113808,  113808,     3.5,     
2.4,     5.1,    57.4,    73.2,   179.0,   60.0,  0.01437,      0,    242,   
10100,   13373,       8,  146086
609 threadCount, total,       7272610,  121130,  121130,  121130,     5.0,     
3.4,     7.7,    62.8,    78.3,   134.9,   60.0,  0.02995,      0,    254,   
10177,   14088,       8,  152827
913 threadCount, total,       7437538,  123715,  123715,  123715,     7.3,     
4.7,    15.0,    69.9,    86.1,   252.8,   60.1,  0.01407,      0,    264,   
10316,   14669,      11,  164130
{code}
Also, [~aweisberg] has been reviewing on the side and has made some nice 
comments, as well.

Overview of changes:

- less reliance on pipeline
I've reduced the number of handlers in the netty pipeline to a bare minimum 
(that is, just one) as I've found in my testing that there is a slight cost to 
operating the netty pipeline: each handler look up the next handler, checking 
the promise's status, and so on. While this change makes the code less like a 
pipeline/chain of commands, it is still easily understandable and will perform 
better. (As an aside, I have a colleague who runs a massively scalable service, 
and they don't use any handlers in the netty pipeline whatsoever - they just 
send ByteBufs to the channel.)

- fixing flush strategy
I've had to dive into the internals of netty to understand all the subtleties 
of how flushing and thread scheduling works, and then matrix that against our 
needs. Thus, I've documented it quite thoroughly in the class-level 
documentation in {{OutboundMessageConnection}} and {{MessageOutHandler}}, and 
the code implements those details.


> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to