[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271701#comment-14271701 ]
Ariel Weisberg commented on CASSANDRA-8457: ------------------------------------------- Took a stab at writing an adaptive approach to coalescing based on a moving average. Numbers look good for the workloads tested. Code https://github.com/aweisberg/cassandra/compare/6be33289f34782e12229a7621022bb5ce66b2f1b...e48133c4d5acbaa6563ea48a0ca118c278b2f6f7 Testing in AWS, 14 servers 6 clients. Using a fixed coalescing window at low concurrency there is a drop of performance from 6746 to 3929. With adaptive coalescing I got 6758. At medium concurrency (5 threads per client, 6 clients) I got 31097 with coalescing disable and 31120 with coalescing. At high concurrency (500 threads per client, 6 clients) I got 479532 with coalescing and 166010 without. This is with a maximum coalescing window of 200 milliseconds. I added debug output to log when coalescing starts and stops and it's interesting. At the beginning of the benchmark things flap, but they don't flap madly. After a few minutes it settles. I also notice a strange thing where CPU utilization at the start of a benchmark is 500% or so and then after a while it climbs. Like something somewhere is warming up or balancing. I recall seeing this in GCE as well. I had one of the OutboundTcpConnections (first to get the permit) log a trace of all outgoing message times. I threw that into a histogram for informational purposes. 50% of messages are sent within 100 microseconds of each other and 92% are sent within one millisecond. This is without any coalescing. {noformat} Value Percentile TotalCount 1/(1-Percentile) 0.000 0.000000000000 5554 1.00 5.703 0.100000000000 124565 1.11 13.263 0.200000000000 249128 1.25 24.143 0.300000000000 373630 1.43 40.607 0.400000000000 498108 1.67 94.015 0.500000000000 622664 2.00 158.463 0.550000000000 684867 2.22 244.351 0.600000000000 747137 2.50 305.407 0.650000000000 809631 2.86 362.239 0.700000000000 871641 3.33 428.031 0.750000000000 933978 4.00 467.711 0.775000000000 965085 4.44 520.703 0.800000000000 996254 5.00 595.967 0.825000000000 1027359 5.71 672.767 0.850000000000 1058457 6.67 743.935 0.875000000000 1089573 8.00 780.799 0.887500000000 1105290 8.89 821.247 0.900000000000 1120774 10.00 868.351 0.912500000000 1136261 11.43 928.767 0.925000000000 1151889 13.33 1006.079 0.937500000000 1167421 16.00 1049.599 0.943750000000 1175260 17.78 1095.679 0.950000000000 1183041 20.00 1143.807 0.956250000000 1190779 22.86 1198.079 0.962500000000 1198542 26.67 1264.639 0.968750000000 1206301 32.00 1305.599 0.971875000000 1210228 35.56 1354.751 0.975000000000 1214090 40.00 1407.999 0.978125000000 1217975 45.71 1470.463 0.981250000000 1221854 53.33 1542.143 0.984375000000 1225759 64.00 1586.175 0.985937500000 1227720 71.11 1634.303 0.987500000000 1229643 80.00 1688.575 0.989062500000 1231596 91.43 1756.159 0.990625000000 1233523 106.67 1839.103 0.992187500000 1235464 128.00 1887.231 0.992968750000 1236430 142.22 1944.575 0.993750000000 1237409 160.00 2007.039 0.994531250000 1238384 182.86 2084.863 0.995312500000 1239358 213.33 2174.975 0.996093750000 1240326 256.00 2230.271 0.996484375000 1240818 284.44 2293.759 0.996875000000 1241292 320.00 2369.535 0.997265625000 1241785 365.71 2455.551 0.997656250000 1242271 426.67 2578.431 0.998046875000 1242752 512.00 2656.255 0.998242187500 1242999 568.89 2740.223 0.998437500000 1243244 640.00 2834.431 0.998632812500 1243482 731.43 2957.311 0.998828125000 1243725 853.33 3131.391 0.999023437500 1243969 1024.00 3235.839 0.999121093750 1244091 1137.78 3336.191 0.999218750000 1244212 1280.00 3471.359 0.999316406250 1244332 1462.86 3641.343 0.999414062500 1244455 1706.67 3837.951 0.999511718750 1244576 2048.00 4001.791 0.999560546875 1244636 2275.56 4136.959 0.999609375000 1244697 2560.00 4399.103 0.999658203125 1244758 2925.71 4628.479 0.999707031250 1244819 3413.33 5119.999 0.999755859375 1244880 4096.00 5439.487 0.999780273438 1244910 4551.11 5791.743 0.999804687500 1244940 5120.00 6582.271 0.999829101563 1244971 5851.43 7917.567 0.999853515625 1245001 6826.67 10027.007 0.999877929688 1245032 8192.00 11321.343 0.999890136719 1245047 9102.22 12607.487 0.999902343750 1245063 10240.00 14524.415 0.999914550781 1245077 11702.86 15785.983 0.999926757813 1245092 13653.33 16416.767 0.999938964844 1245108 16384.00 16793.599 0.999945068359 1245116 18204.44 17072.127 0.999951171875 1245123 20480.00 17465.343 0.999957275391 1245130 23405.71 18563.071 0.999963378906 1245138 27306.67 30883.839 0.999969482422 1245146 32768.00 33030.143 0.999972534180 1245149 36408.89 33587.199 0.999975585938 1245153 40960.00 35061.759 0.999978637695 1245157 46811.43 36241.407 0.999981689453 1245161 54613.33 37257.215 0.999984741211 1245165 65536.00 37322.751 0.999986267090 1245166 72817.78 37978.111 0.999987792969 1245168 81920.00 40534.015 0.999989318848 1245170 93622.86 47382.527 0.999990844727 1245172 109226.67 53510.143 0.999992370605 1245174 131072.00 54558.719 0.999993133545 1245175 145635.56 62586.879 0.999993896484 1245176 163840.00 63700.991 0.999994659424 1245177 187245.71 70320.127 0.999995422363 1245178 218453.33 107806.719 0.999996185303 1245179 262144.00 107806.719 0.999996566772 1245179 291271.11 1882193.919 0.999996948242 1245180 327680.00 1882193.919 0.999997329712 1245180 374491.43 2202009.599 0.999997711182 1245181 436906.67 2202009.599 0.999998092651 1245181 524288.00 2202009.599 0.999998283386 1245181 582542.22 2875195.391 0.999998474121 1245182 655360.00 2875195.391 0.999998664856 1245182 748982.86 2875195.391 0.999998855591 1245182 873813.33 2875195.391 0.999999046326 1245182 1048576.00 2875195.391 0.999999141693 1245182 1165084.44 148176371.711 0.999999237061 1245183 1310720.00 148176371.711 1.000000000000 1245183 #[Mean = 418.657, StdDeviation = 132779.859] #[Max = 148176371.711, Total count = 1245183] #[Buckets = 53, SubBuckets = 2048] {noformat} > nio MessagingService > -------------------- > > Key: CASSANDRA-8457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Ariel Weisberg > Labels: performance > Fix For: 3.0 > > > Thread-per-peer (actually two each incoming and outbound) is a big > contributor to context switching, especially for larger clusters. Let's look > at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)