Re: lost data with 1 failed datanode and replication factor 3 in 6 node cluster

Uma Maheswara Rao G 72686 Sat, 22 Oct 2011 00:03:05 -0700

----- Original Message -----
From: Ossi <los...@gmail.com>
Date: Friday, October 21, 2011 2:57 pm
Subject: lost data with 1 failed datanode and replication factor 3 in 6 node 
cluster
To: common-user@hadoop.apache.org


> hi,
> 
> We managed to lost data when 1 datanode broke down in a cluster of 6
> datanodes with
> replication factor 3.
> 
> As far as I know, that shouldn't happen, since each blocks should 
> have 1
> copy in
> 3 different hosts. So, loosing even 2 nodes should be fine.
> 
> Earlier we did some tests with replication factor 2, but reverted 
> from that:
>   88  2011-10-12 06:46:49 hadoop dfs -setrep -w 2 -R /
>  148  2011-10-12 10:22:09 hadoop dfs -setrep -w 3 -R /
> 
> The lost data was generated after replication factor was set back 
> to 3.
First of all the question is how are you measuring the dataloss?
Any read failure with block missing exceptions?

My guess is that, you are measuring the dataloss by dfsused space. If i am 
correct, the dfsused space will be calculated by complete data available DNs. 
So, when one datanode goes down, then dfs used and ramainig also will reduce 
relatively. This can not be taken as data loss.
Please correct me, if my understanding is wrong with the question.
> And even if replication factor would have been 2, data shouldn't 
> have been
> lost, right?
> 
> We wonder how that is possible and in what situations that could 
> happen?
> 
> br, Ossi
> 
Regards,
Uma

Re: lost data with 1 failed datanode and replication factor 3 in 6 node cluster

Reply via email to