[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653869#comment-16653869
 ] 

Andrew Purtell edited comment on HBASE-21301 at 10/17/18 5:05 PM:
------------------------------------------------------------------

Me
{quote}
Maybe we could sample reads and writes at the HRegion level and keep the 
derived stats in an in-memory data structure in the region. (Much lower 
overhead to keep it in-memory and local than attempt to persist to a real 
table.) We would persist relevant stats from this datastructure into the store 
files written during flushes and compactions."
{quote}

[~allan163] 
bq.  For example, we can record the hit count for a certain data block and keep 
the data in a memory structure. So that we can generate a heatmap for data 
block. I think it can narrow down the hot key in a smaller granularity than 
hfile range,which is too big.

I agree it can be done at the block granularity. We could store hit counts per 
block in meta blocks. 

Overall with the approach that records low level fine grained statistics into 
hfiles, it's easy to see how reads can be tracked this way, less clear what to 
do about writes. 

I advised [~archana.katiyar] to start with region granularity, building on the 
region level metrics for reads and writes that are already available, to lower 
the implementation effort for the first version of this. I also advised using 
the OpenTSDB schema as inspiration for efficient storage and extensibility. At 
this time this table would only store region read and write metrics to support 
this use case, but going forward the stats table will be available and 
potentially very useful for other use cases. I think this is another point in 
favor of using a table here. Above suggestions are great, especially enabling 
the date tiered compaction policy on the table by default. 

Also, we don't need to auto create the table if it doesn't exist, if that is 
going to be a problem. This is expected to be a one time only operation over 
the lifetime of a cluster. An admin can do it when setting up the cluster. We 
can document how to execute a small hbase shell script that creates the table 
where we also document how to enable the feature. 


was (Author: apurtell):
Me
{blockquote}
Maybe we could sample reads and writes at the HRegion level and keep the 
derived stats in an in-memory data structure in the region. (Much lower 
overhead to keep it in-memory and local than attempt to persist to a real 
table.) We would persist relevant stats from this datastructure into the store 
files written during flushes and compactions."
{blockquote}

[~allan163] 
bq.  For example, we can record the hit count for a certain data block and keep 
the data in a memory structure. So that we can generate a heatmap for data 
block. I think it can narrow down the hot key in a smaller granularity than 
hfile range,which is too big.

I agree it can be done at the block granularity. We could store hit counts per 
block in meta blocks. 

Overall with the approach that records low level fine grained statistics into 
hfiles, it's easy to see how reads can be tracked this way, less clear what to 
do about writes. 

I advised [~archana.katiyar] to start with region granularity, building on the 
region level metrics for reads and writes that are already available, to lower 
the implementation effort for the first version of this. I also advised using 
the OpenTSDB schema as inspiration for efficient storage and extensibility. At 
this time this table would only store region read and write metrics to support 
this use case, but going forward the stats table will be available and 
potentially very useful for other use cases. I think this is another point in 
favor of using a table here. Above suggestions are great, especially enabling 
the date tiered compaction policy on the table by default. 

Also, we don't need to auto create the table if it doesn't exist, if that is 
going to be a problem. This is expected to be a one time only operation over 
the lifetime of a cluster. An admin can do it when setting up the cluster. We 
can document how to execute a small hbase shell script that creates the table 
where we also document how to enable the feature. 

> Heatmap for key access patterns
> -------------------------------
>
>                 Key: HBASE-21301
>                 URL: https://issues.apache.org/jira/browse/HBASE-21301
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Archana Katiyar
>            Assignee: Archana Katiyar
>            Priority: Major
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to