[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621 ]
Chris Lohfink edited comment on CASSANDRA-7247 at 5/16/14 5:54 AM: ------------------------------------------------------------------- Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr 4 threadCount, 634450 , 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600 , 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050 , 29035, 29035, 0.5, 0.3, 0.9, 2.5, 11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9, 13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550 , 30900, 30900, 1.2, 0.8, 1.4, 3.0, 22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr 4 threadCount, 643900 , 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100 , 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400 , 30650, 30650, 0.5, 0.3, 0.8, 1.9, 10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3, 20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600 , 30077, 30077, 1.2, 0.8, 1.3, 2.7, 24.9, 1394.3, 32.6, 0.01747 {code} {code:title=ConcurrentStreamSummary with sync} 4 threadCount, 494350 , 16643, 16643, 0.2, 0.2, 0.3, 0.3, 1.0, 943.6, 29.7, 0.01286 8 threadCount, 812950 , 26358, 26358, 0.3, 0.2, 0.3, 0.5, 1.4, 1488.9, 30.8, 0.01909 16 threadCount, 877500 , 27396, 27396, 0.6, 0.3, 1.0, 2.2, 12.1, 1299.2, 32.0, 0.01824 24 threadCount, 837550 , 25345, 25345, 0.9, 0.4, 1.2, 3.7, 84.2, 2123.6, 33.0, 0.02437 36 threadCount, 910200 , 28008, 28008, 1.3, 0.6, 2.8, 9.2, 32.2, 1212.8, 32.5, 0.01654 {code} {code:title=ConcurentStreamSummary no blocking} id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr 4 threadCount, 183600 , 6145, 6145, 0.6, 0.6, 0.8, 1.0, 2.6, 354.5, 29.9, 0.01063 8 threadCount, 197200 , 6593, 6593, 1.2, 1.1, 1.4, 1.8, 3.3, 413.5, 29.9, 0.00716 16 threadCount, 203200 , 6794, 6794, 2.3, 2.2, 2.6, 3.5, 12.1, 649.1, 29.9, 0.01096 24 threadCount, 198000 , 6615, 6615, 3.6, 3.3, 4.2, 4.9, 44.2, 570.4, 29.9, 0.00894 36 threadCount, 199800 , 6627, 6627, 5.4, 4.9, 6.5, 8.0, 110.8, 272.3, 30.1, 0.01452 {code} was (Author: cnlwsu): Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr 4 threadCount, 634450 , 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600 , 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050 , 29035, 29035, 0.5, 0.3, 0.9, 2.5, 11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9, 13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550 , 30900, 30900, 1.2, 0.8, 1.4, 3.0, 22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr 4 threadCount, 643900 , 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100 , 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400 , 30650, 30650, 0.5, 0.3, 0.8, 1.9, 10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3, 20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600 , 30077, 30077, 1.2, 0.8, 1.3, 2.7, 24.9, 1394.3, 32.6, 0.01747 {code} {code:title=ConcurrentStreamSummary with sync} 4 threadCount, 494350 , 16643, 16643, 0.2, 0.2, 0.3, 0.3, 1.0, 943.6, 29.7, 0.01286 8 threadCount, 812950 , 26358, 26358, 0.3, 0.2, 0.3, 0.5, 1.4, 1488.9, 30.8, 0.01909 16 threadCount, 877500 , 27396, 27396, 0.6, 0.3, 1.0, 2.2, 12.1, 1299.2, 32.0, 0.01824 24 threadCount, 837550 , 25345, 25345, 0.9, 0.4, 1.2, 3.7, 84.2, 2123.6, 33.0, 0.02437 36 threadCount, 910200 , 28008, 28008, 1.3, 0.6, 2.8, 9.2, 32.2, 1212.8, 32.5, 0.01654 {code} > Provide top ten most frequent keys per column family > ---------------------------------------------------- > > Key: CASSANDRA-7247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7247 > Project: Cassandra > Issue Type: Improvement > Reporter: Chris Lohfink > Priority: Minor > Attachments: patch.diff > > > Since already have the nice addthis stream library, can use it to keep track > of most frequent DecoratedKeys that come through the system using > StreamSummaries ([nice > explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]). > Then provide a new metric to access them via JMX. -- This message was sent by Atlassian JIRA (v6.2#6252)