Re: Why block sizes shown by 'fsck' and '-stat' are inconsistent?
The fsck is showing you an average block size, not the block size metadata attribute of the file like stat shows. In this specific case, the average is just the length of your file, which is lesser than one whole block. On Sat, Apr 5, 2014 at 8:21 AM, sam liu samliuhad...@gmail.com wrote: Hi Experts, First, I believe it's no doubt that HDFS use only what it needs on the local file system. For example, we store a file(12 KB size) to HDFS, and HDFS only use 12 KB on the local file system, and won't use 64 MB(block size) on the local file system for that file. However, I found the block sizes shown by 'fsck' and '-stat' are inconsistent: 1) hadoop fsck /user/user1/filesize/derby.jar -files -blocks -locations: output: ... BP-1600629425-9.30.122.112-1395627917492:blk_1073743264_2443 len=2673375 ... Total blocks (validated): 1 (avg. block size 2673375 B) ... conslusion: The block size is 2673375 B shown by fsck. 2) hadoop dfs -stat %b %n %o %r %Y /user/user1/filesize/derby.jar: output: 2673375 derby.jar 134217728 2 1396662626191 conslusion: The block size is 134217728 B shown by stat. Also, if I browser this file from http://namenode:50070, the file size of /user/user1/filesize/derby.jar equals to 2.5 MB(2673375 B), however the block size equals to 128 MB(134217728 B). Why block sizes shown by 'fsck' and '-stat' are inconsistent? -- Harsh J
Re: Why block sizes shown by 'fsck' and '-stat' are inconsistent?
Thanks for your comments! As I mentioned HDFS use only what it needs on the local file system. For example, a 16 KB hdfs file only use 16 KB local file system storage, not 64 MB(its hdfs block size) storage. In this case, *what's the use of the block size(64 MB) of the 16 KB file?* 2014-04-05 17:12 GMT+08:00 Harsh J ha...@cloudera.com: The fsck is showing you an average block size, not the block size metadata attribute of the file like stat shows. In this specific case, the average is just the length of your file, which is lesser than one whole block. On Sat, Apr 5, 2014 at 8:21 AM, sam liu samliuhad...@gmail.com wrote: Hi Experts, First, I believe it's no doubt that HDFS use only what it needs on the local file system. For example, we store a file(12 KB size) to HDFS, and HDFS only use 12 KB on the local file system, and won't use 64 MB(block size) on the local file system for that file. However, I found the block sizes shown by 'fsck' and '-stat' are inconsistent: 1) hadoop fsck /user/user1/filesize/derby.jar -files -blocks -locations: output: ... BP-1600629425-9.30.122.112-1395627917492:blk_1073743264_2443 len=2673375 ... Total blocks (validated): 1 (avg. block size 2673375 B) ... conslusion: The block size is 2673375 B shown by fsck. 2) hadoop dfs -stat %b %n %o %r %Y /user/user1/filesize/derby.jar: output: 2673375 derby.jar 134217728 2 1396662626191 conslusion: The block size is 134217728 B shown by stat. Also, if I browser this file from http://namenode:50070, the file size of /user/user1/filesize/derby.jar equals to 2.5 MB(2673375 B), however the block size equals to 128 MB(134217728 B). Why block sizes shown by 'fsck' and '-stat' are inconsistent? -- Harsh J
Re: Why block sizes shown by 'fsck' and '-stat' are inconsistent?
The block size is a meta attribute. If you append to the file later, it still needs to know when to split further - so it keeps that value as a mere metadata it can use to advise itself on write boundaries. On Sat, Apr 5, 2014 at 7:35 PM, sam liu samliuhad...@gmail.com wrote: Thanks for your comments! As I mentioned HDFS use only what it needs on the local file system. For example, a 16 KB hdfs file only use 16 KB local file system storage, not 64 MB(its hdfs block size) storage. In this case, what's the use of the block size(64 MB) of the 16 KB file? 2014-04-05 17:12 GMT+08:00 Harsh J ha...@cloudera.com: The fsck is showing you an average block size, not the block size metadata attribute of the file like stat shows. In this specific case, the average is just the length of your file, which is lesser than one whole block. On Sat, Apr 5, 2014 at 8:21 AM, sam liu samliuhad...@gmail.com wrote: Hi Experts, First, I believe it's no doubt that HDFS use only what it needs on the local file system. For example, we store a file(12 KB size) to HDFS, and HDFS only use 12 KB on the local file system, and won't use 64 MB(block size) on the local file system for that file. However, I found the block sizes shown by 'fsck' and '-stat' are inconsistent: 1) hadoop fsck /user/user1/filesize/derby.jar -files -blocks -locations: output: ... BP-1600629425-9.30.122.112-1395627917492:blk_1073743264_2443 len=2673375 ... Total blocks (validated): 1 (avg. block size 2673375 B) ... conslusion: The block size is 2673375 B shown by fsck. 2) hadoop dfs -stat %b %n %o %r %Y /user/user1/filesize/derby.jar: output: 2673375 derby.jar 134217728 2 1396662626191 conslusion: The block size is 134217728 B shown by stat. Also, if I browser this file from http://namenode:50070, the file size of /user/user1/filesize/derby.jar equals to 2.5 MB(2673375 B), however the block size equals to 128 MB(134217728 B). Why block sizes shown by 'fsck' and '-stat' are inconsistent? -- Harsh J -- Harsh J