[ https://issues.apache.org/jira/browse/CASSANDRA-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571162#comment-16571162 ]
Chris Lohfink commented on CASSANDRA-14436: ------------------------------------------- While I think we can do something like creating a concurrent set of Samplers for each SamplerType that we tie to a Sampler session and flag it to start at same time I dont think its necessary. The current use of top partitions has never had a reported issue with people trying to concurrently run profiling sessions so it can be a new feature to add in another ticket at sometime but I dont think its needed enough here. In meantime I added a strict restriction on a single at a time, raising an exception if someone tries to kick off a 2nd one. Also the sampling will timeout at the end of the duration so if the finish is never called it wont spin forever. I did write some basic jmh benchmarks but i didnt want to make insert() accessible and the {{.*microbench.*}} in build.xml makes default visibility not an option so... yeah. Ultimately (when on) its just ThreadExecuterPool.submit() on the addSample in read/write path which is pretty straight forward limitation on the contention on the queue but i saw 100-300nanosecond -ish. Going into the actual guts, the frequency sampler being a wrapper around the addthis StreamSummary - which there might be something better out there now but its seemed to do fine so far. In some worst case JMH benchmarks I was able to see this hit 3us or so, which could conceivably underperform writes which would cause a backup. The MaxSampler uses MinMaxPriorityQueue, which after PriorityQueue(comparator) becomes available (post java8) that can be replaced to be more performant, but that rarely breaks a microsecond even with top 1024. Just incase as a catchall I added the same as the trace executor - a throwaway loadshedding incase the sampler executor does get backed up. This includes some plumbing so its reported appropriately in metrics. > Add sampler for query time and expose with nodetool > --------------------------------------------------- > > Key: CASSANDRA-14436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14436 > Project: Cassandra > Issue Type: Improvement > Reporter: Chris Lohfink > Assignee: Chris Lohfink > Priority: Major > > Create a new {{nodetool profileload}} that functions just like toppartitions > but with more data, returning the slowest local reads and writes on the host > during a given duration and highest frequency touched partitions (same as > {{nodetool toppartitions}}). Refactor included to extend use of the sampler > for uses outside of top frequency (max instead of total sample values). > Future work to this is to include top cpu and allocations by query and > possibly tasks/cpu/allocations by stage during time window. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org