Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Stack Thu, 31 May 2018 14:42:01 -0700

Just to close the loop, I just made a branch named HDFS-13572 to match the
new non-blocking issue (after some nice encouragement posted up on the
JIRA).
Thanks,
S


On Tue, May 15, 2018 at 9:30 PM, Stack <[email protected]> wrote:

> On Fri, May 4, 2018 at 5:47 AM, Anu Engineer <[email protected]>
> wrote:
>
>> Hi Stack,
>>
>>
>>
>> Why don’t we look at the design of what is being proposed?  Let us post
>> the design to HDFS-9924 and then if needed, by all means let us open a new
>> Jira.
>>
>> That will make it easy to understand the context if someone is looking at
>> HDFS-9924.
>>
>>
>>
>
> I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after
> spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an
> earlier version on HDFS-9924 a while back).
>
> HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to
> be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3
> first, what is an async api, what is async programming, etc.). We hope to
> 'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis
> and by taking on contributor requests in HDFS-9924 -- e.g. a design first,
> dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting.
>
> Hence the new issue for a new undertaking (and to save folks having to
> wade through reams to get to the new effort).
>
>
>
>> I personally believe that it should be the developers of the feature that
>> should decide what goes in, what to call the branch etc. But It would be
>> nice to have
>>
>> some sort of continuity of HDFS-9924.
>>
>>
>>
>
> Agree with the above. I'll take care of tying HDFS-9924 over to the new
> issue.
>
> Thanks,
> St.Ack
>
>
>
>> Thanks
>>
>> Anu
>>
>>
>>
>> *From: *<[email protected]> on behalf of Stack <[email protected]>
>> *Date: *Thursday, May 3, 2018 at 9:04 PM
>> *To: *Anu Engineer <[email protected]>
>> *Cc: *Wei-Chiu Chuang <[email protected]>, "[email protected]"
>> <[email protected]>
>> *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking
>> access to HDFS
>>
>>
>>
>> Thanks for support Wei-Chiu and Anu.
>>
>>
>>
>> Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
>> branch with commits we don't need full of commentary that is, ahem, a mite
>> off-topic.  Duo can attach his design to the new issue. We can cite
>> HDFS-9924 as provenance and aggregate the discussion as launching pad for
>> the new effort in new issue.
>>
>>
>>
>> Hopefully this is agreeable,
>>
>> Thanks,
>>
>>
>>
>> S
>>
>>
>>
>> On Thu, May 3, 2018 at 1:54 PM, Anu Engineer <[email protected]>
>> wrote:
>>
>> Hi St.ack/Wei-Chiu,
>>
>> It is very kind of St.Ack to bring this question to HDFS Dev. I think
>> this is a good feature to have. As for the branch question,
>> HDFS-9924 branch is already open, we could just use that and I am +1 on
>> adding Duo as a branch committer.
>>
>> I am not familiar with HBase code base, I am presuming that there will be
>> some deviation from the current design
>> doc posted in HDFS-9924. Would it be make sense to post a new design
>> proposal on HDFS-9924?
>>
>> --Anu
>>
>>
>>
>>
>> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <[email protected]> wrote:
>>
>>     Given that HBase 2 uses async output by default, the way that code is
>>     maintained today in HBase is not sustainable. That piece of code
>> should be
>>     maintained in HDFS. I am +1 as a participant in both communities.
>>
>>     On Thu, May 3, 2018 at 9:14 AM, Stack <[email protected]> wrote:
>>
>>     > Ok with you lot if a few of us open a branch to work on a
>> non-blocking HDFS
>>     > client?
>>     >
>>     > Intent is to finish up the old issue "HDFS-9924 [umbrella]
>> Nonblocking HDFS
>>     > Access". On the foot of this umbrella JIRA is a proposal by the
>>     > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
>> client
>>     > (written by Duo) that we use making Write-Ahead Logs. We call it
>>     > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
>>     >
>>     > Let me quote Duo from his proposal at the base of HDFS-9924:
>>     >
>>     > ....We use lots of internal APIs of HDFS to implement the
>> AsyncFSWAL, so it
>>     > is expected that things like HBASE-20244
>>     > <https://issues.apache.org/jira/browse/HBASE-20244>
>>     > ["NoSuchMethodException
>>     > when retrieving private method decryptEncryptedDataEncryptionKey
>> from
>>     > DFSClient"] will happen again and again.
>>     >
>>     > To make life easier, we need to move the async output related code
>> into
>>     > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3
>> [1] can
>>     > work, so I would like to create a feature branch to implement the
>> async dfs
>>     > client. In general I think there are 4 steps:
>>     >
>>     > 1. Implement an async rpc client with option 3 [1] described above.
>>     > 2. Implement the filesystem APIs which only need to connect to NN,
>> such as
>>     > 'mkdirs'.
>>     > 3. Implement async file read. The problem is the API. For pread I
>> think a
>>     > CompletableFuture is enough, the problem is for the streaming read.
>> Need to
>>     > discuss later.
>>     > 4. Implement async file write. The API will also be a problem, but
>> a more
>>     > important problem is that, if we want to support fan-out, the
>> current logic
>>     > at DN side will make the semantic broken as we can read uncommitted
>> data
>>     > very easily. In HBase it is solved by HBASE-14004
>>     > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not
>> think we
>>     > should keep the broken behavior in HDFS. We need to find a way to
>> deal with
>>     > it.
>>     >
>>     > Comments welcome.
>>     >
>>     > Intent is to make a branch named HDFS-9924 (or should we just do a
>> new
>>     > JIRA?) and to add Duo as a feature branch committer. If all goes
>> well,
>>     > we'll call for a merge VOTE.
>>     >
>>     > Thanks,
>>     > St.Ack
>>     >
>>     > 1.Option 3:  "Use the old protobuf rpc interface and implement a
>> new rpc
>>     > framework. The benefit is that we also do not need port unification
>> service
>>     > at server side and do not need to maintain two implementations at
>> server
>>     > side. And one more thing is that we do not need to upgrade protobuf
>> to
>>     > 3.x."
>>     >
>>
>>
>>
>>     --
>>     A very happy Hadoop contributor
>>
>>
>>
>
>

Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Reply via email to