Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

Arpit Agarwal Wed, 24 Sep 2014 10:15:41 -0700

Colin, the feature should be evaluated on its technical merits and not on
the speed of development. The high level approach was decided during the
community hangout on Apr 30 - five months ago - which you attended. Nothing
we have developed is contrary to what was decided that day.


We have discussed the eviction strategy on the Jira a few times. There is
no support for the claim that LRU is worse. The suggested alternative of
LFU could be a poor choice for intermediate data. We are not targeting
traditional MapReduce workloads with this feature. In practice any scheme
can fail for a given workload out there. We have made the eviction
interface pluggable to simplify evaluating alternatives. If you have any
specific concerns about the interface itself we can address them. But I
don't see why the evaluation of alternatives cannot happen in trunk.

HDFS-4949 does not decide what to keep in memory nor are applications
provided an API to control it at runtime. This is clearly impractical for
intermediate data. We are targeting different use cases and the lack of
integration does not prevent evaluation or use of either feature. I agree
merging the two schemes is a worthy goal and I filed HDFS-6919 to
investigate it. But it is an open-ended task and will require updates to
both implementations. I don't think it reasonable or practical to ask that
we defer merging until the integration is complete.

Regards,
Arpit

On Tue, Sep 23, 2014 at 6:09 PM, Colin McCabe <[email protected]>
wrote:

> This seems like a really aggressive timeframe for a merge.  We still
> haven't implemented:
>
> * Checksum skipping on read and write from lazy persisted replicas.
> * Allowing mmaped reads from the lazy persisted data.
> * Any eviction strategy other than LRU.
> * Integration with cache pool limits (how do HDFS-4949 and lazy
> persist replicas share memory)?
> * Eviction from RAM disk via truncation (HDFS-6918)
> * Metrics
> * System testing to find out how useful this is, and what the best
> eviction strategy is.
>
> I see why we might want to defer checksum skipping, metrics, allowing
> mmap, eviction via truncation, and so forth until later.  But I feel
> like we need to figure out how this will integrate with the memory
> used by HDFS-4949 before we merge.  I also would like to see another
> eviction strategy other than LRU, which is a very poor eviction
> strategy for scanning workloads.  I mentioned this a few times on the
> JIRA.
>
> I'd also like to get some idea of how much testing this has received
> in a multi-node cluster.  What makes us confident that this is the
> right time to merge, rather than in a week or two?
>
> best,
> Colin
>
>
> On Tue, Sep 23, 2014 at 4:55 PM, Arpit Agarwal <[email protected]>
> wrote:
> > I have posted write benchmark results to the Jira.
> >
> > On Tue, Sep 23, 2014 at 3:41 PM, Arpit Agarwal <[email protected]
> >
> > wrote:
> >
> >> Hi Andrew, I said "it is not going to be a substantial fraction of
> memory
> >> bandwidth". That is certainly not the same as saying it won't be good or
> >> there won't be any improvement.
> >>
> >> Any time you have transfers over RPC or the network stack you will not
> get
> >> close to the memory bandwidth even for intra-host transfers.
> >>
> >> I'll add some micro-benchmark results to the Jira shortly.
> >>
> >> Thanks,
> >> Arpit
> >>
> >> On Tue, Sep 23, 2014 at 2:33 PM, Andrew Wang <[email protected]>
> >> wrote:
> >>
> >>> Hi Arpit,
> >>>
> >>> Here is the comment. It was certainly not my intention to misquote
> anyone.
> >>>
> >>>
> >>>
> https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14138223&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138223
> >>>
> >>> Quote:
> >>>
> >>> It would be nice to see that would could get a substantial fraction of
> >>> memory bandwidth when writing to a single replica in-memory.
> >>>
> >>> The comparison will be interesting but I can tell you without
> measurement
> >>> it is not going to be a substantial fraction of memory bandwidth. We
> are
> >>> still going through DataTransferProtocol with all the copies and
> overhead
> >>> that involves.
> >>>
> >>> When the goal is in-memory writes and we are unable to achieve a
> >>> substantial fraction of memory bandwidth, to me that is "not good
> >>> performance."
> >>>
> >>> I also looked through the subtasks, and AFAICT the only one related to
> >>> improving this is deferring checksum computation. The benchmarking we
> did
> >>> on HDFS-4949 showed that this only really helps when you're down to
> single
> >>> copy or zero copies with SCR/ZCR. DTP reads didn't see much of an
> >>> improvement, so I'd guess the same would be true for DTP writes.
> >>>
> >>> I think my above three questions are still open, as well as my question
> >>> about why we're merging now, as opposed to when the performance of the
> >>> branch is proven out.
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>> On Tue, Sep 23, 2014 at 2:10 PM, Arpit Agarwal <
> [email protected]>
> >>> wrote:
> >>>
> >>> > Andrew, don't misquote me. Can you link the comment where I said
> >>> > performance wasn't going to be good?
> >>> >
> >>> > I will add some add some preliminary write results to the Jira later
> >>> today.
> >>> >
> >>> > > What's the plan to improve write performance?
> >>> > I described this in response to your and Colin's comments on the
> Jira.
> >>> >
> >>> > For the benefit of folks not following the Jira, the immediate task
> we'd
> >>> > like to get done post-merge is moving checksum computation off the
> write
> >>> > path. Also see open subtasks of HDFS-6581 for other planned perf
> >>> > improvements.
> >>> >
> >>> > Thanks,
> >>> > Arpit
> >>> >
> >>> >
> >>> > On Tue, Sep 23, 2014 at 1:07 PM, Andrew Wang <
> [email protected]>
> >>> > wrote:
> >>> >
> >>> > > Hi Arpit,
> >>> > >
> >>> > > On HDFS-6581, I asked for write benchmarks on Sep 19th, and you
> >>> responded
> >>> > > that the performance wasn't going to be good. However, I thought
> the
> >>> > > primary goal of this JIRA was to improve write performance, and
> write
> >>> > > performance is listed as the first feature requirement in the
> design
> >>> doc.
> >>> > >
> >>> > > So, this leads me to a few questions, which I also asked last week
> on
> >>> the
> >>> > > JIRA (I believe still unanswered):
> >>> > >
> >>> > > - What's the plan to improve write performance?
> >>> > > - What kind of performance can we expect after the plan is
> completed?
> >>> > > - Can this expected performance be validated with a prototype?
> >>> > >
> >>> > > Even with these questions answered, I don't understand the need to
> >>> merge
> >>> > > this before the write optimization work is completed. Write perf is
> >>> > listed
> >>> > > as a feature requirement, so the branch can reasonably be called
> not
> >>> > > feature complete until it's shown to be faster.
> >>> > >
> >>> > > Thanks,
> >>> > > Andrew
> >>> > >
> >>> > > On Tue, Sep 23, 2014 at 11:47 AM, Jitendra Pandey <
> >>> > > [email protected]>
> >>> > > wrote:
> >>> > >
> >>> > > > +1. I have reviewed most of the code in the branch, and I think
> its
> >>> > ready
> >>> > > > to be merged to trunk.
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Sep 22, 2014 at 5:24 PM, Arpit Agarwal <
> >>> > [email protected]
> >>> > > >
> >>> > > > wrote:
> >>> > > >
> >>> > > > > HDFS Devs,
> >>> > > > >
> >>> > > > > We propose merging the HDFS-6581 development branch to trunk.
> >>> > > > >
> >>> > > > > The work adds support to write to HDFS blocks in memory. The
> >>> target
> >>> > use
> >>> > > > > case covers applications writing relatively small, intermediate
> >>> data
> >>> > > sets
> >>> > > > > with low latency. We introduce a new CreateFlag for the
> existing
> >>> > > > CreateFile
> >>> > > > > API. HDFS will subsequently attempt to place replicas of file
> >>> blocks
> >>> > in
> >>> > > > > local memory with disk writes occurring off the hot path. The
> >>> current
> >>> > > > > design is a simplification of original ideas from Sanjay Radia
> on
> >>> > > > > HDFS-5851.
> >>> > > > >
> >>> > > > > Key goals of the feature were minimal API changes to reduce
> >>> > application
> >>> > > > > burden and best effort data durability. The feature is optional
> >>> and
> >>> > > > > requires appropriate DN configuration from administrators.
> >>> > > > >
> >>> > > > > Design doc:
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://issues.apache.org/jira/secure/attachment/12661926/HDFSWriteableReplicasInMemory.pdf
> >>> > > > >
> >>> > > > > Test plan:
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://issues.apache.org/jira/secure/attachment/12669452/Test-Plan-for-HDFS-6581-Memory-Storage.pdf
> >>> > > > >
> >>> > > > > There are 28 resolved sub-tasks under HDFS-6581, 3 open tasks
> for
> >>> > > > > tests+Jenkins issues  and 7 open subtasks tracking planned
> >>> > > improvements.
> >>> > > > > The latest merge patch is 3300 lines of changed code of which
> 1300
> >>> > > lines
> >>> > > > is
> >>> > > > > new and updated tests. Merging the branch to trunk will allow
> HDFS
> >>> > > > > applications to start evaluating the feature. We will continue
> >>> work
> >>> > on
> >>> > > > > documentation, performance tuning and metrics in parallel with
> the
> >>> > vote
> >>> > > > and
> >>> > > > > post-merge.
> >>> > > > >
> >>> > > > > Contributors to design and code include Xiaoyu Yao, Sanjay
> Radia,
> >>> > > > Jitendra
> >>> > > > > Pandey, Tassapol Athiapinya, Gopal V, Bikas Saha, Vikram Dixit,
> >>> > Suresh
> >>> > > > > Srinivas and Chris Nauroth.
> >>> > > > >
> >>> > > > > Thanks to Haohui Mai, Colin Patrick McCabe, Andrew Wang, Todd
> >>> Lipcon,
> >>> > > > Eric
> >>> > > > > Baldeschwieler and Vinayakumar B for providing useful feedback
> on
> >>> > > > > HDFS-6581, HDFS-5851 and sub-tasks.
> >>> > > > >
> >>> > > > > The vote runs for the usual 7 days and will expire at 12am PDT
> on
> >>> Sep
> >>> > > 30.
> >>> > > > > Here is my +1 for the merge.
> >>> > > > >
> >>> > > > > Regards,
> >>> > > > > Arpit
> >>> > > > >
> >>> > > > > --
> >>> > > > > CONFIDENTIALITY NOTICE
> >>> > > > > NOTICE: This message is intended for the use of the individual
> or
> >>> > > entity
> >>> > > > to
> >>> > > > > which it is addressed and may contain information that is
> >>> > confidential,
> >>> > > > > privileged and exempt from disclosure under applicable law. If
> the
> >>> > > reader
> >>> > > > > of this message is not the intended recipient, you are hereby
> >>> > notified
> >>> > > > that
> >>> > > > > any printing, copying, dissemination, distribution, disclosure
> or
> >>> > > > > forwarding of this communication is strictly prohibited. If you
> >>> have
> >>> > > > > received this communication in error, please contact the sender
> >>> > > > immediately
> >>> > > > > and delete it from your system. Thank You.
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > <http://hortonworks.com/download/>
> >>> > > >
> >>> > > > --
> >>> > > > CONFIDENTIALITY NOTICE
> >>> > > > NOTICE: This message is intended for the use of the individual or
> >>> > entity
> >>> > > to
> >>> > > > which it is addressed and may contain information that is
> >>> confidential,
> >>> > > > privileged and exempt from disclosure under applicable law. If
> the
> >>> > reader
> >>> > > > of this message is not the intended recipient, you are hereby
> >>> notified
> >>> > > that
> >>> > > > any printing, copying, dissemination, distribution, disclosure or
> >>> > > > forwarding of this communication is strictly prohibited. If you
> have
> >>> > > > received this communication in error, please contact the sender
> >>> > > immediately
> >>> > > > and delete it from your system. Thank You.
> >>> > > >
> >>> > >
> >>> >
> >>> > --
> >>> > CONFIDENTIALITY NOTICE
> >>> > NOTICE: This message is intended for the use of the individual or
> >>> entity to
> >>> > which it is addressed and may contain information that is
> confidential,
> >>> > privileged and exempt from disclosure under applicable law. If the
> >>> reader
> >>> > of this message is not the intended recipient, you are hereby
> notified
> >>> that
> >>> > any printing, copying, dissemination, distribution, disclosure or
> >>> > forwarding of this communication is strictly prohibited. If you have
> >>> > received this communication in error, please contact the sender
> >>> immediately
> >>> > and delete it from your system. Thank You.
> >>> >
> >>>
> >>
> >>
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

Reply via email to