[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-09-21 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142509#comment-14142509
 ] 

Brandon Williams commented on CASSANDRA-7247:
-

Totally missed this when I made CASSANDRA-7974, but I'm not surprised Chris and 
I think alike. :)

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: cassandra-2.1-7247.txt, jconsole.png, patch.txt
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-09-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142375#comment-14142375
 ] 

Benedict commented on CASSANDRA-7247:
-

It's probably better to construct a lightweight wrapper around the data you're 
using for equality (key bytes / token), with knowledge of _how_ to turn it into 
a string, and to do so only when we're asked for the TopK. It could well be 
worth enabling this on a per-CF / per-KS basis, though, or configuring the size 
of the sample in the yaml. If you have large keys (64K), the structure as it 
stands will take up > 128Mb per key space, or > 64Mb with the adjustment I've 
just suggested. Either way that's non-trivial, especially since we have two of 
them. Admittedly such large keys are not likely to be common.

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: cassandra-2.1-7247.txt, jconsole.png, patch.txt
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-09-20 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142343#comment-14142343
 ] 

Chris Lohfink commented on CASSANDRA-7247:
--

Updated to always do it, but I think 2 or 3 are equally viable - its still 
using executor to single-thread it for more performant StreamSummary and 
provide a 1k backlog cap, especially since im not sure about performance impact 
of now using the AbstractType.  Instead of using the DecoratedKey.toString I 
changed it to use the human readable format from the partitions type which 
makes it more useful for debugging.  If keeping this as an always on option I 
can add a nodetool command to list them out in a nice format.

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: cassandra-2.1-7247.txt, jconsole.png, patch.txt
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-06-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036113#comment-14036113
 ] 

Jonathan Ellis commented on CASSANDRA-7247:
---

Using tracingProbability as a coarse on/off feels wrong to me.  I'd prefer one 
of these options:

# If it's cheap, just do it always
# If it makes sense to do it along with other tracing ops, trace if we've 
enabled a trace state already
# Otherwise, introduce a new per-table setting

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: jconsole.png, patch.txt
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-05-20 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004005#comment-14004005
 ] 

Chris Lohfink commented on CASSANDRA-7247:
--

Added patch that uses the trace executor to track the partition thats updated 
the most, has the most columns inserted (useful for finding rows that are too 
wide) and the partitions with slowest insertion times.  Will only track if 
trace probability > 0.

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: jconsole.png, patch.txt
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-05-19 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001741#comment-14001741
 ] 

Chris Lohfink commented on CASSANDRA-7247:
--

Id like to rework this I think to use the trace executor

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Attachments: patch.diff
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-05-16 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621
 ] 

Chris Lohfink commented on CASSANDRA-7247:
--

Problem is StreamSummary is not thread safe.  There is a 
ConcurrentStreamSummary, which I found in this implementation to be ~5x slower 
then a synchronized block around the offer of the non-thread safe one.  
Concurrent did perform similarly when also wrapped in synchronized block which 
I will show below but because it would lose any benefit of being a concurrent 
implementation when access is serialized I think the faster impl is best.

Done on 2013 retina MBP with 500gb ssd:

{code:title=No Changes}
id, ops   ,op/s,   key/s,mean, med, .95, 
.99,.999, max,   time,   stderr
 4 threadCount, 634450,   21692,   21692, 0.2, 0.2, 0.2, 
0.2, 0.4,   740.1,   29.2,  0.01188
 8 threadCount, 886600,   29762,   29762, 0.3, 0.2, 0.3, 
0.4, 1.3,  1007.3,   29.8,  0.01220
16 threadCount, 912050,   29035,   29035, 0.5, 0.3, 0.9, 
2.5,11.2,  1393.8,   31.4,  0.01162
24 threadCount, 1022250   ,   32681,   32681, 0.7, 0.5, 1.0, 
2.9,13.5,  1126.5,   31.3,  0.00923
36 threadCount, 946550,   30900,   30900, 1.2, 0.8, 1.4, 
3.0,22.5,  1369.2,   30.6,  0.01089
{code}

{code:title=With Patch}
id, ops   ,op/s,   key/s,mean, med, .95, 
.99,.999, max,   time,   stderr
 4 threadCount, 643900,   21700,   21700, 0.2, 0.2, 0.2, 
0.2, 0.9,   941.1,   29.7,  0.01079
 8 threadCount, 942100,   32300,   32300, 0.2, 0.2, 0.3, 
0.3, 1.2,   849.5,   29.2,  0.01519
16 threadCount, 907400,   30650,   30650, 0.5, 0.3, 0.8, 
1.9,10.7,  1124.0,   29.6,  0.01112
24 threadCount, 1026150   ,   31753,   31753, 0.7, 0.5, 0.9, 
3.3,20.6,  1299.0,   32.3,  0.01295
36 threadCount, 980600,   30077,   30077, 1.2, 0.8, 1.3, 
2.7,24.9,  1394.3,   32.6,  0.01747
{code}

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: patch.diff
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

2014-05-16 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999849#comment-13999849
 ] 

Chris Lohfink commented on CASSANDRA-7247:
--

Another option might be to spin off this and other metrics into the MiscStage, 
it only has single thread so no synchronization required and wont be as bad to 
put additional metrics in there as well for additional visibility like topK 
size in bytes, worst latencies and such.  I wouldn't expect much difference 
performance-wise with just the one stream summary above since enqueuing onto 
the LinkedBlockingQueue should have similar locking performance 
(synchronization on putlock), but then reading of metric would never cause 
contention (albeit very small) on write path.  If theres any interest I can 
give it a shot though and maybe throw in some additional metrics.

> Provide top ten most frequent keys per column family
> 
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Priority: Minor
> Attachments: patch.diff
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)