[ 
https://issues.apache.org/jira/browse/HADOOP-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710527#action_12710527
 ] 

Jim Kellerman commented on HADOOP-5744:
---------------------------------------

The problem we need to solve is:
- process writing a file crashes after doing some sync() operations
- another process knows that the writer has crashed and needs to
-- recover the lease (immediately)
-- be able to read all the data up to the last sync()

Of the APIs described in the following email, from Sanjay below, APIs 1-2 are 
are inadequate, API3 is ok provided HDFS does not fail. API4 works if the 
datanode(s) fail but not if the machine crashes. Only API5 will guarantee that 
we can read the data that has been sync'd.

> From: Sanjay Radia [mailto:[email protected]] 
> Sent: Thursday, May 14, 2009 1:46 PM
> To: Jim Kellerman (POWERSET)
> Cc: Michael Stack; Chad Walters; Dhruba Borthakur; Sameer Paranjpye; Hairong 
> Kuang; Robert Chansler
> Subject: Re: Append, flush, sync write and HBase
> 
>> On May 13, 2009, at 1:20 PM, Jim Kellerman (POWERSET) wrote:
>> 
>> What we need are two things:
>> 
>> 1. When we call sync() we want to be assured that any buffered data can be 
>> read by another process
> 
> Actually I wanted to have a larger discussion to understand your
> current and future requirements on append/flush/sync also on latency
> of HDFS. I am trying to document current and future
> requirements. Which is why I wanted to do a quick chat on the phone. I
> will try this via email for now.
> 
> BTW Hairong @ Y! is driving the re-implementation of append. See
> HADOOP-5744. She has defined the semantics she is considering.  Please
> comment whether you agree or disagree.
> 
> We are also looking at variation on semantics that may have lower
> latencies and lesser guarantees.  We would like to get your initial
> feedback.  Eventually we will update the Jira when we have semantics
> and apis better formulated.
> 
> Below is a list of APIs/semantics variations we are considering.
> Which ones do you absolutely needed for HBase in the short term and
> which ones may be useful to HBase in the longer term.
> 
> API1: flushes out from the address space of client into the socket to the 
> data nodes. 
> 
>     On the return of the call there is no guarantee that that data is
>     out of the underlying node and no guarantee of having reached a
>     DN.  Readers will see this data soon if there are no failures.
> 
>     For example, I suspect Scribe and chukwa will like the lower
>     latency of this API and are prepared to loose some records
>     occasionally in case of failures.  Clearly a journal will not find
>     this api acceptable.
> 
> API2: flushes out to at lease one data node and receives an ack.
> 
>     New readers will eventually see the data
> 
> API3: flushes out to all replicas of the block. The data is in the buffers of 
> the DNs but not on the DN's OS buffers
> 
>    New readers will see the data after the call has returned. (Hadoop
>    5744 calls API3 hflush for now).
> 
> API4: flushes out to all replicas and all replicas DNs  have done a posix 
> fflush equivalent - ie data  is out the under lying OS file system of the DNs
> 
> API5: flushes out to all replicas and all repliacs have done posix fsync 
> equivalent - ie the OS has flushed it to the disk device (but the disk may 
> have it in its cache).
> 
> Does the HBase edits journal require API 3, 4 or 5?
> 
> What are your latency requirements for the write operation. For
> example can you tolerate occasional larger latency for the
> fflush/fsycn operation?
> 

> Revisit append
> --------------
>
>                 Key: HADOOP-5744
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5744
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: AppendSpec.pdf
>
>
> HADOOP-1700 and related issues have put a lot of efforts to provide the first 
> implementation of append. However, append is such a complex feature. It turns 
> out that there are issues that were initially seemed trivial but needs a 
> careful design. This jira revisits append, aiming for a design and 
> implementation supporting a semantics that are acceptable to its users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to