[ https://issues.apache.org/jira/browse/HBASE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259955#comment-17259955 ]
Andrew Kyle Purtell edited comment on HBASE-25329 at 1/6/21, 6:22 PM: ---------------------------------------------------------------------- bq. Patch seems fine but was curious if we considered adding something like a /rit end point (servlet) on master? I am just getting back up to speed after vacation but let me say the current patch is not ok in my opinion. Please fix the below issues (IMHO): Description is "Dump region hashes in logs for the regions that are stuck in transition for more than a configured amount of time", but - Only the first N hashes are logged. - Patch introduces new metrics. (I share Bharath's view that the usefulness of the new metrics is questionable.) In my opinion RITs are too transient to have a metric for each RIT instance. Metrics about RITs, like total count of RITs in the current metrics reporting period, these are reasonable metrics and we already have them. Per region hash RIT metrics represent objects and data with a limited lifespan and usefulness and are best exported and processed by some other means, such as the debug dump servlet (functionality we already have). This is just my opinion and may not be shared by others. If this patch proposes to "dump region hashes in logs for the regions that are stuck in transition for more than a configured amount of time", then it should actually dump all of the found RITs. Otherwise in my opinion it fails to achieve the basic aim. While there does need to be a limit on total line length of an emitted log line, it is certainly possible to emit as many log lines as needed to dump all RIT in progress. If you still want to introduce new metrics, let's decouple the desired logging as stated here from the metrics, and propose them with a new issue. was (Author: apurtell): bq. Patch seems fine but was curious if we considered adding something like a /rit end point (servlet) on master? I am just getting back up to speed after vacation but let me say the current patch is not ok in my opinion. Please fix the below issues (IMHO): Description is "Dump region hashes in logs for the regions that are stuck in transition for more than a configured amount of time", but - Only the first N hashes are logged. - Patch introduces new metrics. (I share Bharath's view that the usefulness of the new metrics is questionable.) In my opinion RITs are too transient to have a metric for each RIT instance. Metrics about RITs, like total count of RITs in the current metrics reporting period, these are reasonable metrics and we already have them. Per region hash RIT metrics represent objects and data with a limited lifespan and usefulness and are best exported and processed by some other means, such as the debug dump servlet (functionality we already have). This is just my opinion and may not be shared by others. If this patch proposes to "dump region hashes in logs for the regions that are stuck in transition for more than a configured amount of time", then it should actually dump all of the found RITs. Otherwise in my opinion it fails to achieve the basic aim. While there does need to be a limit on total line length of an emitted log line, it is certainly possible to emit as many log lines as needed to dump all RIT in progress. > Dump region hashes in logs for the regions that are stuck in transition for > more than a configured amount of time > ----------------------------------------------------------------------------------------------------------------- > > Key: HBASE-25329 > URL: https://issues.apache.org/jira/browse/HBASE-25329 > Project: HBase > Issue Type: Improvement > Reporter: Caroline > Assignee: Caroline > Priority: Minor > Attachments: HBASE-25329.branch-1.000.patch, > HBASE-25329.branch-2.000.patch, HBASE-25329.master.000.patch > > > We have metrics for number of RITs as well as number of RITs above a certain > threshold, but we don't have any way of keeping track of the region hashes of > those RITs. It would be beneficial to emit those region hashes as a metric, > as well as log them, so that we don't accidentally lose this information for > debugging the RIT at a later tiime. -- This message was sent by Atlassian Jira (v8.3.4#803005)