[jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions

2015-01-21 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-7974:
-
Attachment: CASSANDRA-7974v5.txt

Fixed in v5 an issue - the byte array it was using was not checked for equality 
in the StreamSummary so the same key may show up as duplicates.

 Enable tooling to detect hot partitions
 ---

 Key: CASSANDRA-7974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7974
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Chris Lohfink
 Fix For: 2.1.3

 Attachments: 7974.txt, CASSANDRA-7974v3.txt, CASSANDRA-7974v4.txt, 
 CASSANDRA-7974v5.txt, cassandra-2.1-7974v2.txt


 Sometimes you know you have a hot partition by the load on a replica set, but 
 have no way of determining which partition it is.  Tracing is inadequate for 
 this without a lot of post-tracing analysis that might not yield results.  
 Since we already include stream-lib for HLL in compaction metadata, it 
 shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then 
 return the top partitions hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions

2015-01-21 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-7974:
-
Attachment: CASSANDRA-7974v4.txt

 Enable tooling to detect hot partitions
 ---

 Key: CASSANDRA-7974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7974
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Chris Lohfink
 Fix For: 2.1.3

 Attachments: 7974.txt, CASSANDRA-7974v3.txt, CASSANDRA-7974v4.txt, 
 cassandra-2.1-7974v2.txt


 Sometimes you know you have a hot partition by the load on a replica set, but 
 have no way of determining which partition it is.  Tracing is inadequate for 
 this without a lot of post-tracing analysis that might not yield results.  
 Since we already include stream-lib for HLL in compaction metadata, it 
 shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then 
 return the top partitions hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions

2015-01-19 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-7974:
-
Attachment: CASSANDRA-7974v3.txt

Attached with additional changes:
* Uses open type datatypes (compositedata with tabulardata) for safe mbean 
deserialization
* no longer exposes key validator, instead turns into human readable string 
along with raw bytes when sending over the compositedata
* use own executor instead of TRACE

 Enable tooling to detect hot partitions
 ---

 Key: CASSANDRA-7974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7974
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Chris Lohfink
 Fix For: 2.1.3

 Attachments: 7974.txt, CASSANDRA-7974v3.txt, cassandra-2.1-7974v2.txt


 Sometimes you know you have a hot partition by the load on a replica set, but 
 have no way of determining which partition it is.  Tracing is inadequate for 
 this without a lot of post-tracing analysis that might not yield results.  
 Since we already include stream-lib for HLL in compaction metadata, it 
 shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then 
 return the top partitions hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions

2014-11-10 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani updated CASSANDRA-7974:
--
Fix Version/s: (was: 2.1.2)
   2.1.3

 Enable tooling to detect hot partitions
 ---

 Key: CASSANDRA-7974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7974
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Chris Lohfink
 Fix For: 2.1.3

 Attachments: 7974.txt, cassandra-2.1-7974v2.txt


 Sometimes you know you have a hot partition by the load on a replica set, but 
 have no way of determining which partition it is.  Tracing is inadequate for 
 this without a lot of post-tracing analysis that might not yield results.  
 Since we already include stream-lib for HLL in compaction metadata, it 
 shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then 
 return the top partitions hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions

2014-10-24 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-7974:
-
Attachment: cassandra-2.1-7974v2.txt

I attached a version with a few extras:
* Includes sampling of writes 
* Expose the partition type in JMX so that nodetool can serialize the blobs as 
strings
* Include the margin of error from the summary
* Defaults for capacity and topK count to make it simpler to use, allows 
overriding either with options
** not setting capacity to topK count since summary becomes very inaccurate if 
cardinality vastly exceeds capacity (in case where capacity=10 a cardinality of 
just 100 would be very inaccurate in a lot of loads)
** print out the estimated cardinality (using hyperloglog) so that its easier 
to identify what an appropriate capacity will be if margin of error unacceptable
* make it so if sampling disabled theres no blocking (as opposed to 
synchronizing addSample)
** also make case where sampling being enabled is non-blocking
* made it easy to add additional samplers, I would like to add a columns 
count or size sampler as well

output looks like:
{code}
READ Sampler:
  Cardinality: ~235 (256 capacity used)
  Top 10 partitions:
PartitionCount   +/-
4BpaP7j05i:true  1 0
jSvq6b62uXwfQb:true  1 0
BvkRbLI1rKO:true 1 0
...

WRITE Sampler:
  Cardinality: ~4681 (256 capacity used)
  Top 10 partitions:
Partition  Count   +/-
jXyI4PpocdtXAkvxG8geS1bkY:true4910
bid3tbjRKzDZ4l5Wu:true2912
cWti3ryllghSxOGEuG:true   1918
...
{code}

 Enable tooling to detect hot partitions
 ---

 Key: CASSANDRA-7974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7974
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Brandon Williams
 Attachments: 7974.txt, cassandra-2.1-7974v2.txt


 Sometimes you know you have a hot partition by the load on a replica set, but 
 have no way of determining which partition it is.  Tracing is inadequate for 
 this without a lot of post-tracing analysis that might not yield results.  
 Since we already include stream-lib for HLL in compaction metadata, it 
 shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then 
 return the top partitions hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7974) Enable tooling to detect hot partitions

2014-09-19 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-7974:

Attachment: 7974.txt

Here's a patch that illustrates the lines along which I was thinking.  
Unfortunately this seems to expose a bug in stream lib (or I've done something 
wrong; upgrading to 2.7.0 was no help):

{noformat}
Caused by: java.lang.NullPointerException: null
at 
com.clearspring.analytics.util.DoublyLinkedList.remove(DoublyLinkedList.java:97)
 ~[stream-2.7.0.jar:na]
at 
com.clearspring.analytics.stream.StreamSummary.incrementCounter(StreamSummary.java:137)
 ~[stream-2.7.0.jar:na]
at 
com.clearspring.analytics.stream.StreamSummary.offerReturnAll(StreamSummary.java:128)
 ~[stream-2.7.0.jar:na]
at 
com.clearspring.analytics.stream.StreamSummary.offer(StreamSummary.java:93) 
~[stream-2.7.0.jar:na]
at 
com.clearspring.analytics.stream.StreamSummary.offer(StreamSummary.java:82) 
~[stream-2.7.0.jar:na]
at org.apache.cassandra.utils.TopKSampler.addSample(TopKSampler.java:58) 
~[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1877)
 ~[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1682)
 ~[main/:na]
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:345) ~[main/:na]
at 
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:55)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1360)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2044)
 ~[main/:na]
... 4 common frames omitted
{noformat}

 Enable tooling to detect hot partitions
 ---

 Key: CASSANDRA-7974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7974
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brandon Williams
Assignee: Brandon Williams
 Attachments: 7974.txt


 Sometimes you know you have a hot partition by the load on a replica set, but 
 have no way of determining which partition it is.  Tracing is inadequate for 
 this without a lot of post-tracing analysis that might not yield results.  
 Since we already include stream-lib for HLL in compaction metadata, it 
 shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then 
 return the top partitions hit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)