Re: HDFS file system size issue

2014-04-13 Thread Biswajit Nayak
Whats the replication factor you have? I believe it should be 3. hadoop dus
shows that disk usage without replication. While name node ui page gives
with replication.

38gb * 3 =114gb ~ 1TB

~Biswa
-oThe important thing is not to stop questioning o-


On Mon, Apr 14, 2014 at 9:38 AM, Saumitra wrote:

> Hi Biswajeet,
>
> Non-dfs usage is ~100GB over the cluster. But still the number are nowhere
> near 1TB.
>
> Basically I wanted to point out discrepancy in name node status page and 
> hadoop
> dfs -dus. In my case, earlier one reports DFS usage as 1TB and later one
> reports it to be 35GB. What are the factors that can cause this difference?
> And why is just 35GB data causing DFS to hit its limits?
>
>
>
>
> On 14-Apr-2014, at 8:31 am, Biswajit Nayak 
> wrote:
>
> Hi Saumitra,
>
> Could you please check the non-dfs usage. They also contribute to filling
> up the disk space.
>
>
>
> ~Biswa
> -oThe important thing is not to stop questioning o-
>
>
> On Mon, Apr 14, 2014 at 1:24 AM, Saumitra wrote:
>
>> Hello,
>>
>> We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We
>> are using default HDFS block size.
>>
>> We have noticed that disks of slaves are almost full. From name node's
>> status page (namenode:50070), we could see that disks of live nodes are 90%
>> full and DFS Used% in cluster summary page  is ~1TB.
>>
>> However hadoop dfs -dus / shows that file system size is merely 38GB.
>> 38GB number looks to be correct because we keep only few Hive tables and
>> hadoop's /tmp (distributed cache and job outputs) in HDFS. All other data
>> is cleaned up. I cross-checked this from hadoop dfs -ls. Also I think
>> that there is no internal fragmentation because the files in our Hive
>> tables are well-chopped in ~50MB chunks. Here are last few lines of
>> hadoop fsck / -files -blocks
>>
>> Status: HEALTHY
>>  Total size: 38086441332 B
>>  Total dirs: 232
>>  Total files: 802
>>  Total blocks (validated): 796 (avg. block size 47847288 B)
>>  Minimally replicated blocks: 796 (100.0 %)
>>  Over-replicated blocks: 0 (0.0 %)
>>  Under-replicated blocks: 6 (0.75376886 %)
>>  Mis-replicated blocks: 0 (0.0 %)
>>  Default replication factor: 2
>>  Average block replication: 3.0439699
>>  Corrupt blocks: 0
>>  Missing replicas: 6 (0.24762692 %)
>>  Number of data-nodes: 9
>>  Number of racks: 1
>> FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds
>>
>>
>> My question is that why disks of slaves are getting full even though
>> there are only few files in DFS?
>>
>
>
> _
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
>
>
>

-- 
_
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.


Re: HDFS file system size issue

2014-04-13 Thread Saumitra
Hi Biswajeet,

Non-dfs usage is ~100GB over the cluster. But still the number are nowhere near 
1TB. 

Basically I wanted to point out discrepancy in name node status page and hadoop 
dfs -dus. In my case, earlier one reports DFS usage as 1TB and later one 
reports it to be 35GB. What are the factors that can cause this difference? And 
why is just 35GB data causing DFS to hit its limits?




On 14-Apr-2014, at 8:31 am, Biswajit Nayak  wrote:

> Hi Saumitra,
> 
> Could you please check the non-dfs usage. They also contribute to filling up 
> the disk space. 
> 
> 
> 
> ~Biswa
> -oThe important thing is not to stop questioning o-
> 
> 
> On Mon, Apr 14, 2014 at 1:24 AM, Saumitra  wrote:
> Hello,
> 
> We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We are 
> using default HDFS block size.
> 
> We have noticed that disks of slaves are almost full. From name node’s status 
> page (namenode:50070), we could see that disks of live nodes are 90% full and 
> DFS Used% in cluster summary page  is ~1TB.
> 
> However hadoop dfs -dus / shows that file system size is merely 38GB. 38GB 
> number looks to be correct because we keep only few Hive tables and hadoop’s 
> /tmp (distributed cache and job outputs) in HDFS. All other data is cleaned 
> up. I cross-checked this from hadoop dfs -ls. Also I think that there is no 
> internal fragmentation because the files in our Hive tables are well-chopped 
> in ~50MB chunks. Here are last few lines of hadoop fsck / -files -blocks
> 
> Status: HEALTHY
>  Total size:  38086441332 B
>  Total dirs:  232
>  Total files: 802
>  Total blocks (validated):796 (avg. block size 47847288 B)
>  Minimally replicated blocks: 796 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 6 (0.75376886 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  2
>  Average block replication:   3.0439699
>  Corrupt blocks:  0
>  Missing replicas:6 (0.24762692 %)
>  Number of data-nodes:9
>  Number of racks: 1
> FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds
> 
> 
> My question is that why disks of slaves are getting full even though there 
> are only few files in DFS?
> 
> 
> _
> The information contained in this communication is intended solely for the 
> use of the individual or entity to whom it is addressed and others authorized 
> to receive it. It may contain confidential or legally privileged information. 
> If you are not the intended recipient you are hereby notified that any 
> disclosure, copying, distribution or taking any action in reliance on the 
> contents of this information is strictly prohibited and may be unlawful. If 
> you have received this communication in error, please notify us immediately 
> by responding to this email and then delete it from your system. The firm is 
> neither liable for the proper and complete transmission of the information 
> contained in this communication nor for any delay in its receipt.



Re: HDFS file system size issue

2014-04-13 Thread Biswajit Nayak
Hi Saumitra,

Could you please check the non-dfs usage. They also contribute to filling
up the disk space.



~Biswa
-oThe important thing is not to stop questioning o-


On Mon, Apr 14, 2014 at 1:24 AM, Saumitra wrote:

> Hello,
>
> We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We
> are using default HDFS block size.
>
> We have noticed that disks of slaves are almost full. From name node's
> status page (namenode:50070), we could see that disks of live nodes are 90%
> full and DFS Used% in cluster summary page  is ~1TB.
>
> However hadoop dfs -dus / shows that file system size is merely 38GB.
> 38GB number looks to be correct because we keep only few Hive tables and
> hadoop's /tmp (distributed cache and job outputs) in HDFS. All other data
> is cleaned up. I cross-checked this from hadoop dfs -ls. Also I think
> that there is no internal fragmentation because the files in our Hive
> tables are well-chopped in ~50MB chunks. Here are last few lines of hadoop 
> fsck
> / -files -blocks
>
> Status: HEALTHY
>  Total size: 38086441332 B
>  Total dirs: 232
>  Total files: 802
>  Total blocks (validated): 796 (avg. block size 47847288 B)
>  Minimally replicated blocks: 796 (100.0 %)
>  Over-replicated blocks: 0 (0.0 %)
>  Under-replicated blocks: 6 (0.75376886 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 2
>  Average block replication: 3.0439699
>  Corrupt blocks: 0
>  Missing replicas: 6 (0.24762692 %)
>  Number of data-nodes: 9
>  Number of racks: 1
> FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds
>
>
> My question is that why disks of slaves are getting full even though there
> are only few files in DFS?
>

-- 
_
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.


HDFS file system size issue

2014-04-13 Thread Saumitra
Hello,

We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We are 
using default HDFS block size.

We have noticed that disks of slaves are almost full. From name node’s status 
page (namenode:50070), we could see that disks of live nodes are 90% full and 
DFS Used% in cluster summary page  is ~1TB.

However hadoop dfs -dus / shows that file system size is merely 38GB. 38GB 
number looks to be correct because we keep only few Hive tables and hadoop’s 
/tmp (distributed cache and job outputs) in HDFS. All other data is cleaned up. 
I cross-checked this from hadoop dfs -ls. Also I think that there is no 
internal fragmentation because the files in our Hive tables are well-chopped in 
~50MB chunks. Here are last few lines of hadoop fsck / -files -blocks

Status: HEALTHY
 Total size:38086441332 B
 Total dirs:232
 Total files:   802
 Total blocks (validated):  796 (avg. block size 47847288 B)
 Minimally replicated blocks:   796 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   6 (0.75376886 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 3.0439699
 Corrupt blocks:0
 Missing replicas:  6 (0.24762692 %)
 Number of data-nodes:  9
 Number of racks:   1
FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds


My question is that why disks of slaves are getting full even though there are 
only few files in DFS?