[ https://issues.apache.org/jira/browse/CASSANDRA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Lohfink updated CASSANDRA-7974: ------------------------------------- Attachment: cassandra-2.1-7974v2.txt I attached a version with a few extras: * Includes sampling of writes * Expose the partition type in JMX so that nodetool can serialize the blobs as strings * Include the margin of error from the summary * Defaults for capacity and topK count to make it simpler to use, allows overriding either with options ** not setting capacity to topK count since summary becomes very inaccurate if cardinality vastly exceeds capacity (in case where capacity=10 a cardinality of just 100 would be very inaccurate in a lot of loads) ** print out the estimated cardinality (using hyperloglog) so that its easier to identify what an appropriate capacity will be if margin of error unacceptable * make it so if sampling disabled theres no blocking (as opposed to synchronizing addSample) ** also make case where sampling being enabled is non-blocking * made it easy to add additional samplers, I would like to add a "columns count" or "size" sampler as well output looks like: {code} READ Sampler: Cardinality: ~235 (256 capacity used) Top 10 partitions: Partition Count +/- 4BpaP7j05i:true 1 0 jSvq6b62uXwfQb:true 1 0 BvkRbLI1rKO:true 1 0 ... WRITE Sampler: Cardinality: ~4681 (256 capacity used) Top 10 partitions: Partition Count +/- jXyI4PpocdtXAkvxG8geS1bkY:true 49 10 bid3tbjRKzDZ4l5Wu:true 29 12 cWti3ryllghSxOGEuG:true 19 18 ... {code} > Enable tooling to detect hot partitions > --------------------------------------- > > Key: CASSANDRA-7974 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7974 > Project: Cassandra > Issue Type: Improvement > Reporter: Brandon Williams > Assignee: Brandon Williams > Attachments: 7974.txt, cassandra-2.1-7974v2.txt > > > Sometimes you know you have a hot partition by the load on a replica set, but > have no way of determining which partition it is. Tracing is inadequate for > this without a lot of post-tracing analysis that might not yield results. > Since we already include stream-lib for HLL in compaction metadata, it > shouldn't be too hard to wire up topK for X seconds via jmx/nodetool and then > return the top partitions hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)