[ https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647849#comment-16647849 ]
Archana Katiyar edited comment on HBASE-21301 at 10/12/18 12:20 PM: -------------------------------------------------------------------- *Summary of the work done till now*: * Store data in a HBase table (new system table) ** We will store stats for all the regions corresponding to a given table in this table. ** TODO: Decide upon the schema, [~andrew.purt...@gmail.com] suggested to take a reference from [openTSB schema|http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html]. * Add a ScheduledChore in HRegionServer class; this Chore will wake up in every x minutes (configurable) and store the read\write count for last x minutes in table. In future, same chore can be utilized to store other stats also. There were two options to record read and write stats for last x minutes in HRegion class: ** Introduce new read and write counters which should increment based on the operation performed by the user. ScheduledChore should reset the new counters once it has recorded the current values. ** Use existing read and write counters of HRegion. ScheduledChore should take care of finding the stats for last x minutes (because existing counters keep track of the stats starting when region went live). I am implementing using existing counters to make sure that performance impact per read\write operation is minimal because of this change. * Add a new jsp file which reads data from the table and displays in the form of heatmap. The logic of this jsp file is simple, use table name and epoch time to query stats for all the regions which were live at that time. Also, [~apurtell] suggested to eventually store information per store file (may be not in v1 of this feature, but its a good goal to have). In his own words - _"Regarding what granularity to use for statistics collection, you are definitely on the right track to start with the region as the smallest unit to consider. I believe Google's design of Key Visualizer can drill down to narrower, sub-region, scopes, so I have been thinking about how to achieve that, if we want. I would not recommend doing it for the first cut because we already have support for region level metrics that you can build on. However, imagine during compaction we collect statistics over all K-Vs in every HFile, then write the statistics into the hfile file trailer, then retrieve those statistics later using a new API. This will let us do things like alter compaction strategy decisions with greater awareness of the characteristics of the data in the store (see W-5473921 Enhance compaction upgrade decision to consider file statistics) or potentially generate heatmaps of key access rates at a store file granularity. Each store file will give you a key-range and a read and write access count that you can aggregate. The start and end keys of those ranges will be different from region start and end keys because store files only have a subset of all keys in the region. This lets us find hot regions that are narrower in scope than the region, which will be more precise information on how to, potentially, split the keyspace to better distribute the load, or to narrow down what aspect of application data model or implementation is responsible for the hotspot. I don't know _how_ to track key access stats with sub region granularity, though. We would need this information on hand to write into the hfile during compaction. Maybe we could sample reads and writes at the HRegion level and keep the derived stats in an in-memory data structure in the region. (Much lower overhead to keep it in-memory and local than attempt to persist to a real table.) We would persist relevant stats from this datastructure into the store files written during flushes and compactions."_ was (Author: archana.katiyar): *Summary of the work done till now*: * Store data in a HBase table (new system table) * We will store stats for all the regions corresponding to a given table in this table. * TODO: Decide upon the schema, [~andrew.purt...@gmail.com] suggested to take a reference from [openTSB schema|http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html]. * Add a ScheduledChore in HRegionServer class; this Chore will wake up in every x minutes (configurable) and store the read\write count for last x minutes in table. In future, same chore can be utilized to store other stats also. There were two options to record read and write stats for last x minutes in HRegion class: * Introduce new read and write counters which should increment based on the operation performed by the user. ScheduledChore should reset the new counters once it has recorded the current values. * Use existing read and write counters of HRegion. ScheduledChore should take care of finding the stats for last x minutes (because existing counters keep track of the stats starting when region went live). I am implementing using existing counters to make sure that performance impact per read\write operation is minimal because of this change. * Add a new jsp file which reads data from the table and displays in the form of heatmap. The logic of this jsp file is simple, use table name and epoch time to query stats for all the regions which were live at that time. Also, [~apurtell] suggested to eventually store information per store file (may be not in v1 of this feature, but its a good goal to have). In his own words - _"Regarding what granularity to use for statistics collection, you are definitely on the right track to start with the region as the smallest unit to consider. I believe Google's design of Key Visualizer can drill down to narrower, sub-region, scopes, so I have been thinking about how to achieve that, if we want. I would not recommend doing it for the first cut because we already have support for region level metrics that you can build on. However, imagine during compaction we collect statistics over all K-Vs in every HFile, then write the statistics into the hfile file trailer, then retrieve those statistics later using a new API. This will let us do things like alter compaction strategy decisions with greater awareness of the characteristics of the data in the store (see W-5473921 Enhance compaction upgrade decision to consider file statistics) or potentially generate heatmaps of key access rates at a store file granularity. Each store file will give you a key-range and a read and write access count that you can aggregate. The start and end keys of those ranges will be different from region start and end keys because store files only have a subset of all keys in the region. This lets us find hot regions that are narrower in scope than the region, which will be more precise information on how to, potentially, split the keyspace to better distribute the load, or to narrow down what aspect of application data model or implementation is responsible for the hotspot. I don't know _how_ to track key access stats with sub region granularity, though. We would need this information on hand to write into the hfile during compaction. Maybe we could sample reads and writes at the HRegion level and keep the derived stats in an in-memory data structure in the region. (Much lower overhead to keep it in-memory and local than attempt to persist to a real table.) We would persist relevant stats from this datastructure into the store files written during flushes and compactions."_ > Heatmap for key access patterns > ------------------------------- > > Key: HBASE-21301 > URL: https://issues.apache.org/jira/browse/HBASE-21301 > Project: HBase > Issue Type: Improvement > Reporter: Archana Katiyar > Assignee: Archana Katiyar > Priority: Major > > Google recently released a beta feature for Cloud Bigtable which presents a > heat map of the keyspace. *Given how hotspotting comes up now and again here, > this is a good idea for giving HBase ops a tool to be proactive about it.* > >>> > Additionally, we are announcing the beta version of Key Visualizer, a > visualization tool for Cloud Bigtable key access patterns. Key Visualizer > helps debug performance issues due to unbalanced access patterns across the > key space, or single rows that are too large or receiving too much read or > write activity. With Key Visualizer, you get a heat map visualization of > access patterns over time, along with the ability to zoom into specific key > or time ranges, or select a specific row to find the full row key ID that's > responsible for a hotspot. Key Visualizer is automatically enabled for Cloud > Bigtable clusters with sufficient data or activity, and does not affect Cloud > Bigtable cluster performance. > <<< > From > [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html] > (Copied this description from the write-up by [~apurtell], thanks Andrew.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)