[jira] [Commented] (HDFS-14084) Need for more stats in DFSClient

Wei-Chiu Chuang (JIRA) Thu, 06 Dec 2018 14:08:56 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712068#comment-16712068
 ]


Wei-Chiu Chuang commented on HDFS-14084:
----------------------------------------

(deleted my previous comment because I was talking to Pranay offline and then 
realized I didn't understand what I was talking about)

For most part, I am interested in the distribution of latency number. For 
example, 50%-tile,90%-tile,99%-tile, of OP_DELETE, over some period of time, 
say the past 1, 5 minutes. 

We already have something similar at RPC server level (via config key 
dfs.metrics.percentiles.intervals), just that we don't have that in the client 
side.

Perhaps those metrics can be exported periodically, say 1 minute apart, in the 
debug log.

As I went through the thread, one debate in this thread is whether it should be 
done at RPC client level or file system level. Either way has its own 
advantage. HBase sometimes use dfsclient instead of file system, so if it is 
done only at file system level, hbase won't be able to troubleshoot performance 
issue. Doing it at file system level makes it generic and applicable to HDFS as 
well as webhdfs clients.

> Need for more stats in DFSClient
> --------------------------------
>
>                 Key: HDFS-14084
>                 URL: https://issues.apache.org/jira/browse/HDFS-14084
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Pranay Singh
>            Assignee: Pranay Singh
>            Priority: Minor
>         Attachments: HDFS-14084.001.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -------------------------
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14084) Need for more stats in DFSClient

Reply via email to