[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004005#comment-14004005 ] Chris Lohfink edited comment on CASSANDRA-7247 at 5/20/14 10:15 PM: Added patch that uses the trace executor to track the partition thats updated the most, has the most columns inserted (useful for finding rows that are too wide) and the partitions with slowest insertion times. Will only track if trace probability > 0. Gives (key,count,error) tuples !jconsole.png! was (Author: cnlwsu): Added patch that uses the trace executor to track the partition thats updated the most, has the most columns inserted (useful for finding rows that are too wide) and the partitions with slowest insertion times. Will only track if trace probability > 0. Gives (key,count,error) tuples > Provide top ten most frequent keys per column family > > > Key: CASSANDRA-7247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7247 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Attachments: jconsole.png, patch.txt > > > Since already have the nice addthis stream library, can use it to keep track > of most frequent DecoratedKeys that come through the system using > StreamSummaries ([nice > explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]). > Then provide a new metric to access them via JMX. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004005#comment-14004005 ] Chris Lohfink edited comment on CASSANDRA-7247 at 5/20/14 9:32 PM: --- Added patch that uses the trace executor to track the partition thats updated the most, has the most columns inserted (useful for finding rows that are too wide) and the partitions with slowest insertion times. Will only track if trace probability > 0. Gives (key,count,error) tuples was (Author: cnlwsu): Added patch that uses the trace executor to track the partition thats updated the most, has the most columns inserted (useful for finding rows that are too wide) and the partitions with slowest insertion times. Will only track if trace probability > 0. > Provide top ten most frequent keys per column family > > > Key: CASSANDRA-7247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7247 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Attachments: jconsole.png, patch.txt > > > Since already have the nice addthis stream library, can use it to keep track > of most frequent DecoratedKeys that come through the system using > StreamSummaries ([nice > explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]). > Then provide a new metric to access them via JMX. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621 ] Chris Lohfink edited comment on CASSANDRA-7247 at 5/16/14 5:53 AM: --- Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 643900, 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100, 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400, 30650, 30650, 0.5, 0.3, 0.8, 1.9,10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3,20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600, 30077, 30077, 1.2, 0.8, 1.3, 2.7,24.9, 1394.3, 32.6, 0.01747 {code} {code:title=ConcurrentStreamSummary with sync} 4 threadCount, 494350, 16643, 16643, 0.2, 0.2, 0.3, 0.3, 1.0, 943.6, 29.7, 0.01286 8 threadCount, 812950, 26358, 26358, 0.3, 0.2, 0.3, 0.5, 1.4, 1488.9, 30.8, 0.01909 16 threadCount, 877500, 27396, 27396, 0.6, 0.3, 1.0, 2.2,12.1, 1299.2, 32.0, 0.01824 24 threadCount, 837550, 25345, 25345, 0.9, 0.4, 1.2, 3.7,84.2, 2123.6, 33.0, 0.02437 36 threadCount, 910200, 28008, 28008, 1.3, 0.6, 2.8, 9.2,32.2, 1212.8, 32.5, 0.01654 {code} was (Author: cnlwsu): Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 643900, 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100, 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400, 30650, 30650, 0.5, 0.3, 0.8, 1.9,10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3,20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600, 30077, 30077, 1.2, 0.8, 1.3, 2.7,24.9, 1394.3, 32.6, 0.01747 {code} > Provide top ten most frequent keys per column family >
[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621 ] Chris Lohfink edited comment on CASSANDRA-7247 at 5/16/14 5:51 AM: --- Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 643900, 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100, 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400, 30650, 30650, 0.5, 0.3, 0.8, 1.9,10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3,20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600, 30077, 30077, 1.2, 0.8, 1.3, 2.7,24.9, 1394.3, 32.6, 0.01747 {code} was (Author: cnlwsu): Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 643900, 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100, 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400, 30650, 30650, 0.5, 0.3, 0.8, 1.9,10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3,20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600, 30077, 30077, 1.2, 0.8, 1.3, 2.7,24.9, 1394.3, 32.6, 0.01747 {code} > Provide top ten most frequent keys per column family > > > Key: CASSANDRA-7247 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7247 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Priority: Minor > Attachments: patch.diff > > > Since already have the nice addthis stream library, can use it to keep track > of most frequent DecoratedKeys that come through the system using > StreamSummaries ([nice > explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]). > Then provide a new metric to ac
[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621 ] Chris Lohfink edited comment on CASSANDRA-7247 at 5/16/14 5:55 AM: --- Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~4x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 643900, 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100, 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400, 30650, 30650, 0.5, 0.3, 0.8, 1.9,10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3,20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600, 30077, 30077, 1.2, 0.8, 1.3, 2.7,24.9, 1394.3, 32.6, 0.01747 {code} {code:title=ConcurrentStreamSummary with sync} 4 threadCount, 494350, 16643, 16643, 0.2, 0.2, 0.3, 0.3, 1.0, 943.6, 29.7, 0.01286 8 threadCount, 812950, 26358, 26358, 0.3, 0.2, 0.3, 0.5, 1.4, 1488.9, 30.8, 0.01909 16 threadCount, 877500, 27396, 27396, 0.6, 0.3, 1.0, 2.2,12.1, 1299.2, 32.0, 0.01824 24 threadCount, 837550, 25345, 25345, 0.9, 0.4, 1.2, 3.7,84.2, 2123.6, 33.0, 0.02437 36 threadCount, 910200, 28008, 28008, 1.3, 0.6, 2.8, 9.2,32.2, 1212.8, 32.5, 0.01654 {code} {code:title=ConcurentStreamSummary no blocking} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 183600,6145,6145, 0.6, 0.6, 0.8, 1.0, 2.6, 354.5, 29.9, 0.01063 8 threadCount, 197200,6593,6593, 1.2, 1.1, 1.4, 1.8, 3.3, 413.5, 29.9, 0.00716 16 threadCount, 203200,6794,6794, 2.3, 2.2, 2.6, 3.5,12.1, 649.1, 29.9, 0.01096 24 threadCount, 198000,6615,6615, 3.6, 3.3, 4.2, 4.9,44.2, 570.4, 29.9, 0.00894 36 threadCount, 199800,6627,6627, 5.4, 4.9, 6.5, 8.0, 110.8, 272.3, 30.1, 0.01452 {code} was (Author: cnlwsu): Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch}
[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621 ] Chris Lohfink edited comment on CASSANDRA-7247 at 5/16/14 5:54 AM: --- Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 643900, 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079 8 threadCount, 942100, 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519 16 threadCount, 907400, 30650, 30650, 0.5, 0.3, 0.8, 1.9,10.7, 1124.0, 29.6, 0.01112 24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3,20.6, 1299.0, 32.3, 0.01295 36 threadCount, 980600, 30077, 30077, 1.2, 0.8, 1.3, 2.7,24.9, 1394.3, 32.6, 0.01747 {code} {code:title=ConcurrentStreamSummary with sync} 4 threadCount, 494350, 16643, 16643, 0.2, 0.2, 0.3, 0.3, 1.0, 943.6, 29.7, 0.01286 8 threadCount, 812950, 26358, 26358, 0.3, 0.2, 0.3, 0.5, 1.4, 1488.9, 30.8, 0.01909 16 threadCount, 877500, 27396, 27396, 0.6, 0.3, 1.0, 2.2,12.1, 1299.2, 32.0, 0.01824 24 threadCount, 837550, 25345, 25345, 0.9, 0.4, 1.2, 3.7,84.2, 2123.6, 33.0, 0.02437 36 threadCount, 910200, 28008, 28008, 1.3, 0.6, 2.8, 9.2,32.2, 1212.8, 32.5, 0.01654 {code} {code:title=ConcurentStreamSummary no blocking} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 183600,6145,6145, 0.6, 0.6, 0.8, 1.0, 2.6, 354.5, 29.9, 0.01063 8 threadCount, 197200,6593,6593, 1.2, 1.1, 1.4, 1.8, 3.3, 413.5, 29.9, 0.00716 16 threadCount, 203200,6794,6794, 2.3, 2.2, 2.6, 3.5,12.1, 649.1, 29.9, 0.01096 24 threadCount, 198000,6615,6615, 3.6, 3.3, 4.2, 4.9,44.2, 570.4, 29.9, 0.00894 36 threadCount, 199800,6627,6627, 5.4, 4.9, 6.5, 8.0, 110.8, 272.3, 30.1, 0.01452 {code} was (Author: cnlwsu): Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best. Done on 2013 retina MBP with 500gb ssd against trunk: {code:title=No Changes} id, ops ,op/s, key/s,mean, med, .95, .99,.999, max, time, stderr 4 threadCount, 634450, 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188 8 threadCount, 886600, 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220 16 threadCount, 912050, 29035, 29035, 0.5, 0.3, 0.9, 2.5,11.2, 1393.8, 31.4, 0.01162 24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9,13.5, 1126.5, 31.3, 0.00923 36 threadCount, 946550, 30900, 30900, 1.2, 0.8, 1.4, 3.0,22.5, 1369.2, 30.6, 0.01089 {code} {code:title=With Patch}