[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897804#comment-16897804 ]
Yiqun Lin commented on HDFS-14313: ---------------------------------- Almost loogs good now, some minor comments from me: {noformat} The deepCopyReplica call does't use the datasetock s {noformat} One typo , does't --> doesn't {noformat} setting set fs.getspaceused.classname {noformat} please remove the redundant set and update setting to Setting. {noformat} "blockPoolId: {}, replicas size: {}, copy replicas duration: {}ms" {noformat} Can you update to {noformat} Copy replica infos, blockPoolId: {}, replicas size: {}, duration: {}ms" {noformat} Update refresh to Refresh. {noformat} fs.getClient().delete("/testReplicaCachingGetSpaceUsed", true); {noformat} We can directly call the filesystem api to delete file. {noformat} fs.delete(new Path("/testReplicaCachingGetSpaceUsed"), true); {noformat} {quote}Get space used by DU impl must be greater than by ReplicaCachingGetSpaceUsed impl. get space used by ReplicaCachingGetSpaceUsed impl is more accurate. so is it necessary to add comparison for the DU impl class? {quote} You have raised up a good point the ReplicaCachingGetSpaceUsed way will only calculate the finalized blocks while du command way includes more files. Can you comment this important difference in the javadoc comment of class ReplicaCachingGetSpaceUsed? We should let others know which files this class will calculate for. Yes, the calculation way is different now. Can you add an additional test to test with the case that some blocks files are not finalized, for example being rbw state? And then we check if the dfsused is correctly be updated. > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > ---------------------------------------------------------------------------------------- > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance > Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 > Reporter: Lisheng Sun > Assignee: Lisheng Sun > Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, > HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, > HDFS-14313.008.patch, HDFS-14313.009.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org