[jira] [Commented] (HBASE-16577) On reducing the log level of *TooSlow log lines

Heng Chen (JIRA) Wed, 07 Sep 2016 22:45:03 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472862#comment-15472862
 ]


Heng Chen commented on HBASE-16577:
-----------------------------------

I make a statistic on one region of our real cluster (most requests are Get or 
Scan).   All "responseTooSlow" request number is 1451, and slow scan number is 
1446.  As for the slow scan, i can't distinguish whether it is normal or not (I 
don't know if the request has too many items to scan),  so i have to ignore 
them.  
But as for slow get,  it is worthy log i can go deeper to figure out the 
reason.  Canary is not enough because slow get request is unusual, we can't 
turn it on all the time.

So IMO we'd better just keep the real slow request (Get, mutation) as 'WARN' 
Level, ignore the uncertain ones (If we can't distinguish them, just remove 
them from 'WARN' level).

> On reducing the log level of *TooSlow log lines
> -----------------------------------------------
>
>                 Key: HBASE-16577
>                 URL: https://issues.apache.org/jira/browse/HBASE-16577
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Andrew Purtell
>
> I was looking at the nature and distribution of responseTooSlow messages on 
> our clusters. The majority of responseTooSlow warnings are for Scan or Multi. 
> Scans may take a long time to return. It will totally depend on how much data 
> is in the table, how the data is distributed, and the range and selectivity 
> of the query. We are not measuring response time in a way to know what is 
> proportionate to the work requested. Another problematic example is Multi. We 
> don't get valid results considering a multi with 1 op and a multi with 100 
> ops to be equivalent as far as being "too slow". 
> If we aggressively filter responseTooSlow messages to just include the ops we 
> can expect to be small, constant, request-independent units of work, this 
> leaves us with Get and Mutate. This gives us no more information then we get 
> from the Canary with read and write checks turned on. The Canary issues Get 
> and Mutate ops and, better, measures availability and latency from the client 
> perspective. 
>  
> Where I end up is I think my shop should ignore responseTooSlow as signal as 
> being far too noisy. The trouble then is it is logged at WARN level. This 
> implies there is something wrong that needs to be fixed. That may not be the 
> case. It's going to require some analysis of the application and the request 
> particulars extracted from the log line. WARN seems inappropriate for this 
> type of indication. There's nothing (necessarily) wrong with HBase, or the 
> app. Should be logged at more like INFO. 
> Furthermore the response might not be too slow, so calling it 
> "responseTooSlow" isn't quite right. More like "responseMaybeSlow" :-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16577) On reducing the log level of *TooSlow log lines

Reply via email to