Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

Colin McCabe Tue, 23 Sep 2014 18:09:58 -0700

This seems like a really aggressive timeframe for a merge.  We still
haven't implemented:


* Checksum skipping on read and write from lazy persisted replicas.
* Allowing mmaped reads from the lazy persisted data.
* Any eviction strategy other than LRU.
* Integration with cache pool limits (how do HDFS-4949 and lazy
persist replicas share memory)?
* Eviction from RAM disk via truncation (HDFS-6918)
* Metrics
* System testing to find out how useful this is, and what the best
eviction strategy is.

I see why we might want to defer checksum skipping, metrics, allowing
mmap, eviction via truncation, and so forth until later.  But I feel
like we need to figure out how this will integrate with the memory
used by HDFS-4949 before we merge.  I also would like to see another
eviction strategy other than LRU, which is a very poor eviction
strategy for scanning workloads.  I mentioned this a few times on the
JIRA.

I'd also like to get some idea of how much testing this has received
in a multi-node cluster.  What makes us confident that this is the
right time to merge, rather than in a week or two?

best,
Colin


On Tue, Sep 23, 2014 at 4:55 PM, Arpit Agarwal <[email protected]> wrote:
> I have posted write benchmark results to the Jira.
>
> On Tue, Sep 23, 2014 at 3:41 PM, Arpit Agarwal <[email protected]>
> wrote:
>
>> Hi Andrew, I said "it is not going to be a substantial fraction of memory
>> bandwidth". That is certainly not the same as saying it won't be good or
>> there won't be any improvement.
>>
>> Any time you have transfers over RPC or the network stack you will not get
>> close to the memory bandwidth even for intra-host transfers.
>>
>> I'll add some micro-benchmark results to the Jira shortly.
>>
>> Thanks,
>> Arpit
>>
>> On Tue, Sep 23, 2014 at 2:33 PM, Andrew Wang <[email protected]>
>> wrote:
>>
>>> Hi Arpit,
>>>
>>> Here is the comment. It was certainly not my intention to misquote anyone.
>>>
>>>
>>> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14138223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138223
>>>
>>> Quote:
>>>
>>> It would be nice to see that would could get a substantial fraction of
>>> memory bandwidth when writing to a single replica in-memory.
>>>
>>> The comparison will be interesting but I can tell you without measurement
>>> it is not going to be a substantial fraction of memory bandwidth. We are
>>> still going through DataTransferProtocol with all the copies and overhead
>>> that involves.
>>>
>>> When the goal is in-memory writes and we are unable to achieve a
>>> substantial fraction of memory bandwidth, to me that is "not good
>>> performance."
>>>
>>> I also looked through the subtasks, and AFAICT the only one related to
>>> improving this is deferring checksum computation. The benchmarking we did
>>> on HDFS-4949 showed that this only really helps when you're down to single
>>> copy or zero copies with SCR/ZCR. DTP reads didn't see much of an
>>> improvement, so I'd guess the same would be true for DTP writes.
>>>
>>> I think my above three questions are still open, as well as my question
>>> about why we're merging now, as opposed to when the performance of the
>>> branch is proven out.
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Tue, Sep 23, 2014 at 2:10 PM, Arpit Agarwal <[email protected]>
>>> wrote:
>>>
>>> > Andrew, don't misquote me. Can you link the comment where I said
>>> > performance wasn't going to be good?
>>> >
>>> > I will add some add some preliminary write results to the Jira later
>>> today.
>>> >
>>> > > What's the plan to improve write performance?
>>> > I described this in response to your and Colin's comments on the Jira.
>>> >
>>> > For the benefit of folks not following the Jira, the immediate task we'd
>>> > like to get done post-merge is moving checksum computation off the write
>>> > path. Also see open subtasks of HDFS-6581 for other planned perf
>>> > improvements.
>>> >
>>> > Thanks,
>>> > Arpit
>>> >
>>> >
>>> > On Tue, Sep 23, 2014 at 1:07 PM, Andrew Wang <[email protected]>
>>> > wrote:
>>> >
>>> > > Hi Arpit,
>>> > >
>>> > > On HDFS-6581, I asked for write benchmarks on Sep 19th, and you
>>> responded
>>> > > that the performance wasn't going to be good. However, I thought the
>>> > > primary goal of this JIRA was to improve write performance, and write
>>> > > performance is listed as the first feature requirement in the design
>>> doc.
>>> > >
>>> > > So, this leads me to a few questions, which I also asked last week on
>>> the
>>> > > JIRA (I believe still unanswered):
>>> > >
>>> > > - What's the plan to improve write performance?
>>> > > - What kind of performance can we expect after the plan is completed?
>>> > > - Can this expected performance be validated with a prototype?
>>> > >
>>> > > Even with these questions answered, I don't understand the need to
>>> merge
>>> > > this before the write optimization work is completed. Write perf is
>>> > listed
>>> > > as a feature requirement, so the branch can reasonably be called not
>>> > > feature complete until it's shown to be faster.
>>> > >
>>> > > Thanks,
>>> > > Andrew
>>> > >
>>> > > On Tue, Sep 23, 2014 at 11:47 AM, Jitendra Pandey <
>>> > > [email protected]>
>>> > > wrote:
>>> > >
>>> > > > +1. I have reviewed most of the code in the branch, and I think its
>>> > ready
>>> > > > to be merged to trunk.
>>> > > >
>>> > > >
>>> > > > On Mon, Sep 22, 2014 at 5:24 PM, Arpit Agarwal <
>>> > [email protected]
>>> > > >
>>> > > > wrote:
>>> > > >
>>> > > > > HDFS Devs,
>>> > > > >
>>> > > > > We propose merging the HDFS-6581 development branch to trunk.
>>> > > > >
>>> > > > > The work adds support to write to HDFS blocks in memory. The
>>> target
>>> > use
>>> > > > > case covers applications writing relatively small, intermediate
>>> data
>>> > > sets
>>> > > > > with low latency. We introduce a new CreateFlag for the existing
>>> > > > CreateFile
>>> > > > > API. HDFS will subsequently attempt to place replicas of file
>>> blocks
>>> > in
>>> > > > > local memory with disk writes occurring off the hot path. The
>>> current
>>> > > > > design is a simplification of original ideas from Sanjay Radia on
>>> > > > > HDFS-5851.
>>> > > > >
>>> > > > > Key goals of the feature were minimal API changes to reduce
>>> > application
>>> > > > > burden and best effort data durability. The feature is optional
>>> and
>>> > > > > requires appropriate DN configuration from administrators.
>>> > > > >
>>> > > > > Design doc:
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12661926/HDFSWriteableReplicasInMemory.pdf
>>> > > > >
>>> > > > > Test plan:
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12669452/Test-Plan-for-HDFS-6581-Memory-Storage.pdf
>>> > > > >
>>> > > > > There are 28 resolved sub-tasks under HDFS-6581, 3 open tasks for
>>> > > > > tests+Jenkins issues  and 7 open subtasks tracking planned
>>> > > improvements.
>>> > > > > The latest merge patch is 3300 lines of changed code of which 1300
>>> > > lines
>>> > > > is
>>> > > > > new and updated tests. Merging the branch to trunk will allow HDFS
>>> > > > > applications to start evaluating the feature. We will continue
>>> work
>>> > on
>>> > > > > documentation, performance tuning and metrics in parallel with the
>>> > vote
>>> > > > and
>>> > > > > post-merge.
>>> > > > >
>>> > > > > Contributors to design and code include Xiaoyu Yao, Sanjay Radia,
>>> > > > Jitendra
>>> > > > > Pandey, Tassapol Athiapinya, Gopal V, Bikas Saha, Vikram Dixit,
>>> > Suresh
>>> > > > > Srinivas and Chris Nauroth.
>>> > > > >
>>> > > > > Thanks to Haohui Mai, Colin Patrick McCabe, Andrew Wang, Todd
>>> Lipcon,
>>> > > > Eric
>>> > > > > Baldeschwieler and Vinayakumar B for providing useful feedback on
>>> > > > > HDFS-6581, HDFS-5851 and sub-tasks.
>>> > > > >
>>> > > > > The vote runs for the usual 7 days and will expire at 12am PDT on
>>> Sep
>>> > > 30.
>>> > > > > Here is my +1 for the merge.
>>> > > > >
>>> > > > > Regards,
>>> > > > > Arpit
>>> > > > >
>>> > > > > --
>>> > > > > CONFIDENTIALITY NOTICE
>>> > > > > NOTICE: This message is intended for the use of the individual or
>>> > > entity
>>> > > > to
>>> > > > > which it is addressed and may contain information that is
>>> > confidential,
>>> > > > > privileged and exempt from disclosure under applicable law. If the
>>> > > reader
>>> > > > > of this message is not the intended recipient, you are hereby
>>> > notified
>>> > > > that
>>> > > > > any printing, copying, dissemination, distribution, disclosure or
>>> > > > > forwarding of this communication is strictly prohibited. If you
>>> have
>>> > > > > received this communication in error, please contact the sender
>>> > > > immediately
>>> > > > > and delete it from your system. Thank You.
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > <http://hortonworks.com/download/>
>>> > > >
>>> > > > --
>>> > > > CONFIDENTIALITY NOTICE
>>> > > > NOTICE: This message is intended for the use of the individual or
>>> > entity
>>> > > to
>>> > > > which it is addressed and may contain information that is
>>> confidential,
>>> > > > privileged and exempt from disclosure under applicable law. If the
>>> > reader
>>> > > > of this message is not the intended recipient, you are hereby
>>> notified
>>> > > that
>>> > > > any printing, copying, dissemination, distribution, disclosure or
>>> > > > forwarding of this communication is strictly prohibited. If you have
>>> > > > received this communication in error, please contact the sender
>>> > > immediately
>>> > > > and delete it from your system. Thank You.
>>> > > >
>>> > >
>>> >
>>> > --
>>> > CONFIDENTIALITY NOTICE
>>> > NOTICE: This message is intended for the use of the individual or
>>> entity to
>>> > which it is addressed and may contain information that is confidential,
>>> > privileged and exempt from disclosure under applicable law. If the
>>> reader
>>> > of this message is not the intended recipient, you are hereby notified
>>> that
>>> > any printing, copying, dissemination, distribution, disclosure or
>>> > forwarding of this communication is strictly prohibited. If you have
>>> > received this communication in error, please contact the sender
>>> immediately
>>> > and delete it from your system. Thank You.
>>> >
>>>
>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

Reply via email to