[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020746#comment-17020746 ] Michael Stack commented on HBASE-20234: --- Unscheduling a subtask since gone stale/unaddressed. > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Bug >Reporter: Michael Stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428871#comment-16428871 ] Anastasia Braginsky commented on HBASE-20234: - {quote}Do we have access to Fetch-And-Increment from java? {quote} I looked around and found those links: [http://ashkrit.blogspot.co.il/2014/02/atomicinteger-java-7-vs-java-8.html] It appears that since Java8 there are some intrinsic function to change CAS to F&I in Java8 [http://ashkrit.blogspot.co.il/2017/07/java-intrinsic-magic.html] I can not find F&I in the Unsafe class, although it is reasonable to support it. {quote}The counters may be local to Store but they are updated by multiple threads so they'll be contended, no? I suppose if the counter is at region-level, there'll be more contention {quote} The in-memory-compaction counters per-store should have no contention as there can be only one in-memory-compaction at a store in specific point in time. Of course, such per-region counters will experience contention. > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427585#comment-16427585 ] stack commented on HBASE-20234: --- [~anastas] Do we have access to Fetch-And-Increment from java? bq. What do you think? The counters may be local to Store but they are updated by multiple threads so they'll be contended, no? I suppose if the counter is at region-level, there'll be more contention > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427506#comment-16427506 ] Anastasia Braginsky commented on HBASE-20234: - {quote}bq.Be careful on per-store counters. HBase is already crippled keeping counts (one of the main users of CPU). The per-region counters are already costly. {quote} I see. If we plan not to add any counter it would be quite difficult to keep the metrics. Let's think we only want to keep a number of in-memory-flushes that happened for a region. I can, for example, add a single per-region counter that will be updated upon every in-memory-flush of any related store. However, it will not save us too much CPU power. The CPU will be busy with contention on updating a single "global" counter, where alternatively CPU will be busy with updating "local" counters. With "global" counter you save some memory. The only CPU power that is saved is the one that goes for collecting "local" counters into "global" one. What do you think? On a separate note, if HBase has so many counters, probably the big part of them are AtomicInteger (for concurrency). The ingrementAndGet() method of atomic integer is based on Compare-And-Swap (CAS) atomic hardware instruction. This is not efficient, as CAS repeats in a while-loop till success. While an alternative Fetch-And-Increment (F&I) atomic hardware instruction always succeeds upon its first invocation. Bottom line, if you change this implementation you save at least half of the CPU power that goes to those increments all over the HBase... What do you think about that? > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427136#comment-16427136 ] stack commented on HBASE-20234: --- bq. There are currently no per-store counters that I have described (number of in-memory-flushes, number of saved bytes), nor that they make it up to region-level. No problem to add them. [~anastas] Thanks. Be careful on per-store counters. HBase is already crippled keeping counts (one of the main users of CPU). The per-region counters are already costly. If we were to do per-store, the number of counts would blossom. bq. Alternatively we can coordinate the collection of the in-memory-flush counters to the flush-to-disk update, that already exists in the code. If no flush to disk, no counters. If no flush to disk, then we are not doing heavy writes. Maybe its ok to do this. It'd be better than a background thread doing counter updates. When hbase is not doing counters, its context switching between all the various background threads that do caretaking; no cpu left over to do actual work (smile). Thanks for looking at this A. Keep asking questions. The metrics system is cryptic. Lets save you getting lost in it if we can help it. > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426872#comment-16426872 ] Anastasia Braginsky commented on HBASE-20234: - [~stack], I can take care for this issue. There are currently no per-store counters that I have described (number of in-memory-flushes, number of saved bytes), nor that they make it up to region-level. No problem to add them. I have taken a look on the Region's Metrics design. It looks like the notion of Histogram (MetricHistogram) we can use well for in-memory-flushes history, saved bytes history. However, MetricsRegionServer's method updateFlush() appears to be invoked once in a while upon organized flush-to-disk. As in-memory-flush is an asynchronous event (appearing from time to time on different stores), we will need to keep some new per-region periodic task to collect the per-store counters and transfer them to the Metrics. Alternatively we can coordinate the collection of the in-memory-flush counters to the flush-to-disk update, that already exists in the code. I am not sure it is good to bind those events. Any alternative ideas? > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Anastasia Braginsky >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413180#comment-16413180 ] stack commented on HBASE-20234: --- Thanks [~anastas] Simple incrementing counters work. Point at the location of what needs exposing and I can do it. Per CompactingMemStore might be too many counts...We do per region counts at the moment. The per-Store counts don't make it up to Region-level? > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413001#comment-16413001 ] Anastasia Braginsky commented on HBASE-20234: - As I see the current metrics in the region are all related to on-disk flushes.I am not sure how those metrics collection is working. Is it the counters that are summarized all the time (just growing), or is it something reset in some period of time? In general, I would suggest the following metrics # Number of in-memory-flushes/rate of in-memory-flushes/average number of in-memory-flushes per CompactingMemStore # Number of bytes saved by CompactingMemStore (whether it is by flattening, merging or compacting) per flush to disk. Meaning a per CompactingMemStore counter that is zeroed upon flush to disk, and then collects again all the saved bytes. # Average pipeline length, averaged over the time and/or between stores # Anything else? > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20234) Expose in-memory compaction metrics
[ https://issues.apache.org/jira/browse/HBASE-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407478#comment-16407478 ] Anoop Sam John commented on HBASE-20234: It will be good to see info abt Compacting Memstore like the pipeline length, the frequency of in memory flush/ compactions and also how these ops help in reducing the memstore size. When in memory flush or compaction happens, we expect the resulting heap size to be lesser. > Expose in-memory compaction metrics > --- > > Key: HBASE-20234 > URL: https://issues.apache.org/jira/browse/HBASE-20234 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > > Hard to glean insight from how well in-memory compaction is doing currently. > It dumps stats into the logs but better if they were available to a > dashboard. This issue is about exposing a couple of helpful counts. There are > already by-region metrics. We can add a few for in-memory compaction (Help me > out [~anastas]... what counts would be best to expose). > Flush related metrics include > {code} > Namespace_default_table_tsdb-tree_region_cfbf23e7330a1a2bbde031f9583d3415_metric_flushesQueuedCount: > { > description: "Number flushes requested/queued for this region", > value: 0 > { > description: "The number of cells flushed to disk", > value: 0 > }, > { > description: "The total amount of data flushed to disk, in bytes", > value: 0 > }, > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)