Re: Why block sizes shown by 'fsck' and '-stat' are inconsistent?

2014-04-05 Thread Harsh J
The fsck is showing you an average block size, not the block size
metadata attribute of the file like stat shows. In this specific case,
the average is just the length of your file, which is lesser than one
whole block.

On Sat, Apr 5, 2014 at 8:21 AM, sam liu samliuhad...@gmail.com wrote:
 Hi Experts,

 First, I believe it's no doubt that HDFS use only what it needs on the local
 file system. For example, we store a file(12 KB size) to HDFS, and HDFS only
 use 12 KB on the local file system, and won't use 64 MB(block size) on the
 local file system for that file.

 However, I found the block sizes shown by 'fsck' and '-stat' are
 inconsistent:

 1) hadoop fsck /user/user1/filesize/derby.jar -files -blocks -locations:
 output:
 ...
 BP-1600629425-9.30.122.112-1395627917492:blk_1073743264_2443 len=2673375
 ...
 Total blocks (validated):  1 (avg. block size 2673375 B)
 ...
 conslusion:
 The block size is 2673375 B shown by fsck.

 2) hadoop dfs -stat %b %n %o %r %Y /user/user1/filesize/derby.jar:
 output:
 2673375 derby.jar 134217728 2 1396662626191
 conslusion:
 The block size is 134217728 B shown by stat.

 Also, if I browser this file from http://namenode:50070, the file size of
 /user/user1/filesize/derby.jar equals to 2.5 MB(2673375 B), however the
 block size equals to 128 MB(134217728 B).

 Why block sizes shown by 'fsck' and '-stat' are inconsistent?






-- 
Harsh J


Re: Why block sizes shown by 'fsck' and '-stat' are inconsistent?

2014-04-05 Thread sam liu
Thanks for your comments!

As I mentioned HDFS use only what it needs on the local file system. For
example, a 16 KB hdfs file only use 16 KB local file system storage, not 64
MB(its hdfs block size) storage. In this case, *what's the use of the block
size(64 MB) of the 16 KB file?*


2014-04-05 17:12 GMT+08:00 Harsh J ha...@cloudera.com:

 The fsck is showing you an average block size, not the block size
 metadata attribute of the file like stat shows. In this specific case,
 the average is just the length of your file, which is lesser than one
 whole block.

 On Sat, Apr 5, 2014 at 8:21 AM, sam liu samliuhad...@gmail.com wrote:
  Hi Experts,
 
  First, I believe it's no doubt that HDFS use only what it needs on the
 local
  file system. For example, we store a file(12 KB size) to HDFS, and HDFS
 only
  use 12 KB on the local file system, and won't use 64 MB(block size) on
 the
  local file system for that file.
 
  However, I found the block sizes shown by 'fsck' and '-stat' are
  inconsistent:
 
  1) hadoop fsck /user/user1/filesize/derby.jar -files -blocks -locations:
  output:
  ...
  BP-1600629425-9.30.122.112-1395627917492:blk_1073743264_2443 len=2673375
  ...
  Total blocks (validated):  1 (avg. block size 2673375 B)
  ...
  conslusion:
  The block size is 2673375 B shown by fsck.
 
  2) hadoop dfs -stat %b %n %o %r %Y /user/user1/filesize/derby.jar:
  output:
  2673375 derby.jar 134217728 2 1396662626191
  conslusion:
  The block size is 134217728 B shown by stat.
 
  Also, if I browser this file from http://namenode:50070, the file size
 of
  /user/user1/filesize/derby.jar equals to 2.5 MB(2673375 B), however the
  block size equals to 128 MB(134217728 B).
 
  Why block sizes shown by 'fsck' and '-stat' are inconsistent?
 
 
 



 --
 Harsh J



Re: Why block sizes shown by 'fsck' and '-stat' are inconsistent?

2014-04-05 Thread Harsh J
The block size is a meta attribute. If you append to the file later,
it still needs to know when to split further - so it keeps that value
as a mere metadata it can use to advise itself on write boundaries.

On Sat, Apr 5, 2014 at 7:35 PM, sam liu samliuhad...@gmail.com wrote:
 Thanks for your comments!

 As I mentioned HDFS use only what it needs on the local file system. For
 example, a 16 KB hdfs file only use 16 KB local file system storage, not 64
 MB(its hdfs block size) storage. In this case, what's the use of the block
 size(64 MB) of the 16 KB file?


 2014-04-05 17:12 GMT+08:00 Harsh J ha...@cloudera.com:

 The fsck is showing you an average block size, not the block size
 metadata attribute of the file like stat shows. In this specific case,
 the average is just the length of your file, which is lesser than one
 whole block.

 On Sat, Apr 5, 2014 at 8:21 AM, sam liu samliuhad...@gmail.com wrote:
  Hi Experts,
 
  First, I believe it's no doubt that HDFS use only what it needs on the
  local
  file system. For example, we store a file(12 KB size) to HDFS, and HDFS
  only
  use 12 KB on the local file system, and won't use 64 MB(block size) on
  the
  local file system for that file.
 
  However, I found the block sizes shown by 'fsck' and '-stat' are
  inconsistent:
 
  1) hadoop fsck /user/user1/filesize/derby.jar -files -blocks -locations:
  output:
  ...
  BP-1600629425-9.30.122.112-1395627917492:blk_1073743264_2443 len=2673375
  ...
  Total blocks (validated):  1 (avg. block size 2673375 B)
  ...
  conslusion:
  The block size is 2673375 B shown by fsck.
 
  2) hadoop dfs -stat %b %n %o %r %Y /user/user1/filesize/derby.jar:
  output:
  2673375 derby.jar 134217728 2 1396662626191
  conslusion:
  The block size is 134217728 B shown by stat.
 
  Also, if I browser this file from http://namenode:50070, the file size
  of
  /user/user1/filesize/derby.jar equals to 2.5 MB(2673375 B), however the
  block size equals to 128 MB(134217728 B).
 
  Why block sizes shown by 'fsck' and '-stat' are inconsistent?
 
 
 



 --
 Harsh J





-- 
Harsh J