[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863758#comment-15863758 ]
Jason Brown commented on CASSANDRA-8457: ---------------------------------------- OK, so I've been performance load testing the snot of this code for the last several weeks, with help from netty committers, flight recorder, and flame graphs. As a result, I've made some major and some minor tweaks, and now I'm slightly faster than trunk with slightly better throughput. I have some optimizations in my back pocket that will increase even more, but as Sylvain has stated before, we'll save those for follow up tickets. trunk {code} id, type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb 4 threadCount, total, 233344, 3889, 3889, 3889, 1.0, 1.0, 1.2, 1.3, 1.5, 68.2, 60.0, 0.01549, 0, 9, 538, 538, 4, 5381 8 threadCount, total, 544637, 9076, 9076, 9076, 0.8, 0.8, 1.0, 1.1, 1.4, 73.8, 60.0, 0.00978, 0, 20, 1267, 1267, 5, 11848 16 threadCount, total, 1126627, 18774, 18774, 18774, 0.8, 0.8, 0.9, 1.0, 5.5, 78.2, 60.0, 0.01882, 0, 40, 2665, 2665, 6, 23666 24 threadCount, total, 1562460, 26036, 26036, 26036, 0.9, 0.8, 1.0, 1.1, 9.1, 81.3, 60.0, 0.00837, 0, 55, 3543, 3543, 9, 32619 36 threadCount, total, 2098097, 34962, 34962, 34962, 1.0, 0.9, 1.1, 1.3, 60.9, 83.0, 60.0, 0.00793, 0, 73, 4665, 4665, 7, 43144 54 threadCount, total, 2741814, 45686, 45686, 45686, 1.1, 1.0, 1.4, 1.7, 62.2, 131.7, 60.0, 0.01321, 0, 93, 5748, 5748, 7, 55097 81 threadCount, total, 3851131, 64166, 64166, 64166, 1.2, 1.0, 1.6, 2.6, 62.3, 151.7, 60.0, 0.01152, 0, 159, 8190, 8521, 14, 106805 121 threadCount, total, 4798169, 79947, 79947, 79947, 1.5, 1.1, 2.0, 3.0, 63.5, 117.8, 60.0, 0.05689, 0, 165, 9323, 9439, 5, 97536 181 threadCount, total, 5647043, 94088, 94088, 94088, 1.9, 1.4, 2.6, 4.9, 68.5, 169.2, 60.0, 0.01639, 0, 195, 10106, 11011, 11, 126422 271 threadCount, total, 6450510, 107461, 107461, 107461, 2.5, 1.8, 3.7, 12.0, 75.4, 155.8, 60.0, 0.01542, 0, 228, 10304, 12789, 9, 143857 406 threadCount, total, 6700764, 111635, 111635, 111635, 3.6, 2.5, 5.3, 55.8, 75.6, 196.5, 60.0, 0.01800, 0, 243, 9995, 13170, 7, 144166 609 threadCount, total, 7119535, 118477, 118477, 118477, 5.1, 3.5, 7.9, 62.8, 85.1, 170.0, 60.1, 0.01775, 0, 250, 10149, 13781, 7, 148118 913 threadCount, total, 7093347, 117981, 117981, 117981, 7.7, 4.9, 15.7, 71.3, 101.1, 173.4, 60.1, 0.02780, 0, 246, 10327, 13859, 8, 155896 {code} 8457 {code} id, type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb 4 threadCount, total, 161668, 2694, 2694, 2694, 1.4, 1.4, 1.6, 1.7, 3.2, 68.2, 60.0, 0.01264, 0, 6, 363, 363, 4, 3631 8 threadCount, total, 498139, 8301, 8301, 8301, 0.9, 0.9, 1.1, 1.3, 1.8, 73.5, 60.0, 0.00446, 0, 19, 1164, 1164, 6, 11266 16 threadCount, total, 765437, 12756, 12756, 12756, 1.2, 1.2, 1.4, 1.5, 5.7, 74.8, 60.0, 0.01251, 0, 29, 1819, 1819, 5, 17238 24 threadCount, total, 1122768, 18710, 18710, 18710, 1.2, 1.2, 1.4, 1.5, 8.5, 127.7, 60.0, 0.00871, 0, 42, 2538, 2538, 5, 25054 36 threadCount, total, 1649658, 27489, 27489, 27489, 1.3, 1.2, 1.4, 1.6, 60.1, 77.7, 60.0, 0.00627, 0, 57, 3652, 3652, 7, 33743 54 threadCount, total, 2258999, 37641, 37641, 37641, 1.4, 1.3, 1.6, 1.8, 62.5, 81.7, 60.0, 0.00771, 0, 79, 4908, 4908, 6, 46789 81 threadCount, total, 3255005, 54220, 54220, 54220, 1.5, 1.2, 1.7, 2.2, 63.8, 133.4, 60.0, 0.02030, 0, 117, 6953, 7008, 9, 72208 121 threadCount, total, 4643184, 77293, 77293, 77293, 1.5, 1.2, 1.8, 2.9, 62.6, 112.7, 60.1, 0.02449, 0, 171, 8976, 9135, 9, 101583 181 threadCount, total, 5625693, 93731, 93731, 93731, 1.9, 1.4, 2.4, 4.8, 67.2, 208.1, 60.0, 0.02373, 0, 217, 9675, 11585, 11, 138725 271 threadCount, total, 6213997, 103523, 103523, 103523, 2.6, 1.8, 3.5, 27.2, 69.7, 183.1, 60.0, 0.01456, 0, 227, 9977, 12392, 7, 137334 406 threadCount, total, 6832341, 113808, 113808, 113808, 3.5, 2.4, 5.1, 57.4, 73.2, 179.0, 60.0, 0.01437, 0, 242, 10100, 13373, 8, 146086 609 threadCount, total, 7272610, 121130, 121130, 121130, 5.0, 3.4, 7.7, 62.8, 78.3, 134.9, 60.0, 0.02995, 0, 254, 10177, 14088, 8, 152827 913 threadCount, total, 7437538, 123715, 123715, 123715, 7.3, 4.7, 15.0, 69.9, 86.1, 252.8, 60.1, 0.01407, 0, 264, 10316, 14669, 11, 164130 {code} Also, [~aweisberg] has been reviewing on the side and has made some nice comments, as well. Overview of changes: - less reliance on pipeline I've reduced the number of handlers in the netty pipeline to a bare minimum (that is, just one) as I've found in my testing that there is a slight cost to operating the netty pipeline: each handler look up the next handler, checking the promise's status, and so on. While this change makes the code less like a pipeline/chain of commands, it is still easily understandable and will perform better. (As an aside, I have a colleague who runs a massively scalable service, and they don't use any handlers in the netty pipeline whatsoever - they just send ByteBufs to the channel.) - fixing flush strategy I've had to dive into the internals of netty to understand all the subtleties of how flushing and thread scheduling works, and then matrix that against our needs. Thus, I've documented it quite thoroughly in the class-level documentation in {{OutboundMessageConnection}} and {{MessageOutHandler}}, and the code implements those details. > nio MessagingService > -------------------- > > Key: CASSANDRA-8457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 > Project: Cassandra > Issue Type: New Feature > Reporter: Jonathan Ellis > Assignee: Jason Brown > Priority: Minor > Labels: netty, performance > Fix For: 4.x > > > Thread-per-peer (actually two each incoming and outbound) is a big > contributor to context switching, especially for larger clusters. Let's look > at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)