[jira] [Comment Edited] (HBASE-25329) Dump region hashes in logs for the regions that are stuck in transition for more than a configured amount of time

Andrew Kyle Purtell (Jira) Wed, 06 Jan 2021 10:23:14 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259955#comment-17259955
 ]


Andrew Kyle Purtell edited comment on HBASE-25329 at 1/6/21, 6:22 PM:
----------------------------------------------------------------------

bq.  Patch seems fine but was curious if we considered adding something like a 
/rit end point (servlet) on master? 

I am just getting back up to speed after vacation but let me say the current 
patch is not ok in my opinion. Please fix the below issues (IMHO):

Description is "Dump region hashes in logs for the regions that are stuck in 
transition for more than a configured amount of time", but
- Only the first N hashes are logged.
- Patch introduces new metrics. (I share Bharath's view that the usefulness of 
the new metrics is questionable.)

In my opinion RITs are too transient to have a metric for each RIT instance. 
Metrics about RITs, like total count of RITs in the current metrics reporting 
period, these are reasonable metrics and we already have them. Per region hash 
RIT metrics represent objects and data with a limited lifespan and usefulness 
and are best exported and processed by some other means, such as the debug dump 
servlet (functionality we already have). This is just my opinion and may not be 
shared by others. 

If this patch proposes to "dump region hashes in logs for the regions that are 
stuck in transition for more than a configured amount of time", then it should 
actually dump all of the found RITs. Otherwise in my opinion it fails to 
achieve the basic aim. While there does need to be a limit on total line length 
of an emitted log line, it is certainly possible to emit as many log lines as 
needed to dump all RIT in progress. 

If you still want to introduce new metrics, let's decouple the desired logging 
as stated here from the metrics, and propose them with a new issue. 


was (Author: apurtell):
bq.  Patch seems fine but was curious if we considered adding something like a 
/rit end point (servlet) on master? 

I am just getting back up to speed after vacation but let me say the current 
patch is not ok in my opinion. Please fix the below issues (IMHO):

Description is "Dump region hashes in logs for the regions that are stuck in 
transition for more than a configured amount of time", but
- Only the first N hashes are logged.
- Patch introduces new metrics. (I share Bharath's view that the usefulness of 
the new metrics is questionable.)

In my opinion RITs are too transient to have a metric for each RIT instance. 
Metrics about RITs, like total count of RITs in the current metrics reporting 
period, these are reasonable metrics and we already have them. Per region hash 
RIT metrics represent objects and data with a limited lifespan and usefulness 
and are best exported and processed by some other means, such as the debug dump 
servlet (functionality we already have). This is just my opinion and may not be 
shared by others. 

If this patch proposes to "dump region hashes in logs for the regions that are 
stuck in transition for more than a configured amount of time", then it should 
actually dump all of the found RITs. Otherwise in my opinion it fails to 
achieve the basic aim. While there does need to be a limit on total line length 
of an emitted log line, it is certainly possible to emit as many log lines as 
needed to dump all RIT in progress. 

> Dump region hashes in logs for the regions that are stuck in transition for 
> more than a configured amount of time
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25329
>                 URL: https://issues.apache.org/jira/browse/HBASE-25329
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Caroline
>            Assignee: Caroline
>            Priority: Minor
>         Attachments: HBASE-25329.branch-1.000.patch, 
> HBASE-25329.branch-2.000.patch, HBASE-25329.master.000.patch
>
>
> We have metrics for number of RITs as well as number of RITs above a certain 
> threshold, but we don't have any way of keeping track of the region hashes of 
> those RITs. It would be beneficial to emit those region hashes as a metric, 
> as well as log them, so that we don't accidentally lose this information for 
> debugging the RIT at a later tiime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-25329) Dump region hashes in logs for the regions that are stuck in transition for more than a configured amount of time

Reply via email to