[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894893#comment-16894893
 ] 

Lisheng Sun edited comment on HDFS-14313 at 7/29/19 3:11 AM:
-------------------------------------------------------------

Thanx [~linyiqun] for your review.

*FSCachingGetSpaceUsed*
{quote}Line 53: Add the final keyword for the FsVolumeImpl variable.
 Line 54: Add the final keyword too.
{quote}
The value of volume and bpid is assignment by set*, so don't add the final 
keyword for these two variable.
{quote}Line 75: We don't pass the config to use the threshold time now, still 
we need to override this method? If don't need, the change made in
 GetSpaceUsed can also be reverted.
{quote}
Add the new variable volume and bpid of FSCachingGetSpaceUsed for HDFS module,so
 ReplicaCachingGetSpaceUsed‘s Constructor parameter must be 
FSCachingGetSpaceUsed#Builder and don't remove FSCachingGetSpaceUsed#build().
 * TestReplicaCachingGetSpaceUsed*
{quote}Line 69: As I have mentioned before, can we have an additional 
comparison for the DU impl class? The most of lines can be reused for these two 
getused impl class. Just passing different key value with restart the mini 
cluster and comparing the used space.
{quote}

 get space used by DU impl include

├── current
 │   ├── BP-1876464514-10.239.56.179-1564369203299
 │   │   ├── current
 │   │   │   ├── VERSION
 │   │   │   ├── finalized
 │   │   │   │   └── subdir0
 │   │   │   │   └── subdir0
 │   │   │   │   ├── blk_1073741825
 │   │   │   │   └── blk_1073741825_1001.meta
 │   │   │   └── rbw
 │   │   ├── scanner.cursor
 │   │   └── tmp
 │   └── VERSION
 └── in_use.lock

get space used by ReplicaCachingGetSpaceUsed impl include

├── blk_1073741825
 └── blk_1073741825_1001.meta

 Get space used by DU impl include all directories size, other files such as 
VERSION, in_use.lock and so on.

Get space used by DU impl must be greater than by ReplicaCachingGetSpaceUsed 
impl. get space used by ReplicaCachingGetSpaceUsed impl is more accurate. so is 
it necessary to add comparison for the DU impl class? 

Please correct me if I was wrong. Thank [~linyiqun] again.


was (Author: leosun08):
Thanx [~linyiqun] for your review.

*FSCachingGetSpaceUsed*
{quote}Line 53: Add the final keyword for the FsVolumeImpl variable.
 Line 54: Add the final keyword too.
{quote}
The value of volume and bpid is assignment by set*, so don't add the final 
keyword for these two variable.
{quote}Line 75: We don't pass the config to use the threshold time now, still 
we need to override this method? If don't need, the change made in
 GetSpaceUsed can also be reverted.
{quote}
Add the new variable volume and bpid of FSCachingGetSpaceUsed for HDFS module,so
 ReplicaCachingGetSpaceUsed‘s Constructor parameter must be 
FSCachingGetSpaceUsed#Builder and don't remove FSCachingGetSpaceUsed#build().
 * TestReplicaCachingGetSpaceUsed*
{quote}Line 69: As I have mentioned before, can we have an additional 
comparison for the DU impl class? The most of lines can be reused for these two 
getused impl class. Just passing different key value with restart the mini 
cluster and comparing the used space.
{quote}

 get space used by DU impl include

├── current
│   ├── BP-1876464514-10.239.56.179-1564369203299
│   │   ├── current
│   │   │   ├── VERSION
│   │   │   ├── finalized
│   │   │   │   └── subdir0
│   │   │   │   └── subdir0
│   │   │   │   ├── blk_1073741825
│   │   │   │   └── blk_1073741825_1001.meta
│   │   │   └── rbw
│   │   ├── scanner.cursor
│   │   └── tmp
│   └── VERSION
└── in_use.lock

get space used by ReplicaCachingGetSpaceUsed impl include

├── blk_1073741825
└── blk_1073741825_1001.meta

 get space used by DU impl include all directories size, other files such as 
VERSION, in_use.lock and so on

Get space used by DU impl must be greater than by ReplicaCachingGetSpaceUsed 
impl. get space used by ReplicaCachingGetSpaceUsed impl is more accurate. so is 
it necessary to add comparison for the DU impl class? 

Please correct me if I was wrong. Thank [~linyiqun] again.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to