[ https://issues.apache.org/jira/browse/HDFS-15597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215816#comment-17215816 ]
Ayush Saxena commented on HDFS-15597: ------------------------------------- {quote}However it will be incorrect for HDFS-EC. (replication=0) {quote} replication 0 or 1? IIRC its 1. The present approach multiplies the length with replication factor, the result would not be correct, if the file isn't replicated to its set replication factor. Try creating a file with replication factor 3, with only 2 datanodes, the {{DistributedFileSystem#getContentSummary}} will give only 2*length for a regular replicated file. EC has another issues as well here, I think, if the status is file and erasure coded, the erasure coding policy isn't set. > ContentSummary.getSpaceConsumed does not consider replication > ------------------------------------------------------------- > > Key: HDFS-15597 > URL: https://issues.apache.org/jira/browse/HDFS-15597 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfs > Affects Versions: 2.6.0 > Reporter: Ajmal Ahammed > Assignee: Aihua Xu > Priority: Minor > Attachments: HDFS-15597.patch > > > I am trying to get the disk space consumed by an HDFS directory using the > {{ContentSummary.getSpaceConsumed}} method. I can't get the space consumption > correctly considering the replication factor. The replication factor is 2, > and I was expecting twice the size of the actual file size from the above > method. > I can't get the space consumption correctly considering the replication > factor. The replication factor is 2, and I was expecting twice the size of > the actual file size from the above method. > {code} > ubuntu@ubuntu:~/ht$ sudo -u hdfs hdfs dfs -ls /var/lib/ubuntu > Found 2 items > -rw-r--r-- 2 ubuntu ubuntu 3145728 2020-09-08 09:55 > /var/lib/ubuntu/size-test > drwxrwxr-x - ubuntu ubuntu 0 2020-09-07 06:37 /var/lib/ubuntu/test > {code} > But when I run the following code, > {code} > String path = "/etc/hadoop/conf/"; > conf.addResource(new Path(path + "core-site.xml")); > conf.addResource(new Path(path + "hdfs-site.xml")); > long size = > FileContext.getFileContext(conf).util().getContentSummary(fileStatus).getSpaceConsumed(); > System.out.println("Replication : " + fileStatus.getReplication()); > System.out.println("File size : " + size); > {code} > The output is > {code} > Replication : 0 > File size : 3145728 > {code} > Both the file size and the replication factor seems to be incorrect. > /etc/hadoop/conf/hdfs-site.xml contains the following config: > {code} > <property> > <name>dfs.replication</name> > <value>2</value> > </property> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org