RE: question about data correctness

Runping Qi Mon, 23 Oct 2006 14:43:18 -0700


Your question is related to http://issues.apache.org/jira/browse/HADOOP-563.


However, I am not sure whether the scenario you described happens to Hadoop.

Runping


> -----Original Message-----
> From: Dhruba Borthakur [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 23, 2006 11:26 AM
> To: hadoop-dev@lucene.apache.org
> Subject: question about data correctness
> 
> Hi folks,
> 
> 
> 
> I am going through the lease protocol between the DFSClient and the
> namenode. The client renews his lease every 30 seconds. There is a single
> thread per client that renews leases to all servers .The namenode declares
> a
> client as 'dead' if it does not get a lease-renewal message in 60 seconds.
> The namenode then reclaims the datablocks for that file; these datablocks
> may now get allocated from another file.
> 
> 
> 
> If it so happens that a client gets delayed for more than 60 seconds in
> its
> lease renewal (due to network congestion, the client-lease-renewal thread
> experiencing a timeout for a dead server, etc. etc), then the namenode
> will
> experience a lease expiration and will reclaim the blocks for that file in
> question. The namenode may now allocate these blocks to a new file. This
> new
> file may start writing to this block. Meanwhile the original file-writer
> may
> continue to flush his data to the same block because it has not yet
> experienced a lease-timeout-exception.  This may lead to data corruption.
> 
> 
> 
> Maybe hadoop already prevents the above from occurring. If so, can
> somebody
> please explain how it is done? Thanks in advance.
> 
> 
> 
> Traditional cluster software has to depend on hardware (IO fencing, SCSI
> IO
> reservations, etc to prevent the above from occurring).
> 
> 
> 
> Thanks,
> 
> dhruba
> 
>

RE: question about data correctness

Reply via email to