Re: Compatibility in Apache Hadoop

Alejandro Abdelnur Tue, 23 Apr 2013 11:51:22 -0700

Andrew,

Or with a twist, why not break/consolidate things as follows?


common API
common IMPL
hdfs CLIENT IMPL
hdfs SERVER IMPL
hdfs TOOLS
<other filesystems> CLIENT
yarn API
yarn CLIENT IMPL
yarn SERVER IMPL
yarn TOOLS
mapred API
mapred IMPL
mapred TOOLS

IMO, this would help significantly to reduce dependency hell (like bringing
servlet, jetty JAR to a hadoop client app).

Thx

On Tue, Apr 23, 2013 at 11:32 AM, Andrew Purtell <apurt...@apache.org>wrote:

> At the risk of hijacking this conversation a bit, what do you think of the
> notion of moving interfaces like Seekable and PositionedReadable into a new
> foundational Maven module, perhaps just for such interfaces that define and
> tag support for core semantics, as their details are better defined and
> documented? I was involved in a discussion today considering factoring out
> the codecs so other ecosystem projects might pull in only codec code.
> Similar to how hadoop-auth is slender and has a useful servlet filter
> implementing SPEGNO authentication, and so it is pulled into various
> places, and can even be used with Hadoop 1. The only thing preventing a
> clean separation of codecs like this is imports of Seekable and
> PositionedReadable. But these define behavior, they don't implement it.
>
>
> On Tue, Apr 23, 2013 at 9:00 AM, Steve Loughran <ste...@hortonworks.com
> >wrote:
>
> > On 22 April 2013 18:32, Eli Collins <e...@cloudera.com> wrote:
> >
> > > On Mon, Apr 22, 2013 at 5:42 PM, Steve Loughran <
> ste...@hortonworks.com>
> > > wrote:
> > >
> > > >
> > > > There's a separate issue that says "we make some guarantee that the
> > > > behaviour of a interface remains consistent over versions", which is
> > hard
> > > > to do without some rigorous definition of what the expected behaviour
> > of
> > > an
> > > > implementation should be.
> > >
> > >
> > > Good point, Steve.  I've assumed the semantics of the API had to
> > > respect the attribute (eg changing the semantics of FileSystem#close
> > > would be an incompatible change, since this is a public/stable API,
> > > even if the new semantics are arguably better).  But you're right,
> > > unless we've actually defined what the semantics of the APIs are it's
> > > hard to say if we've materially changed them.  How about adding a new
> > > section on the page and calling that out explicitly?
> > >
> >
> > +1.
> >
> > Maybe we should list which bits we consider both well specified and
> covered
> > with tests that verify the implementations in our svn match that
> > specification.
> >
> >
> > >
> > > In practice I think we'll have to take semantics case by case, clearly
> > > define the semantics we care about better in the javadocs (for the
> > > major end user-facing classes at least, calling out both intended
> > > behavior and behavior that's meant to be undefined) and using
> > > individual judgement elsewhere.  For example, HDFS-4156 changed
> > > DataInputStream#seek to throw an IOE if you seek to a negative offset,
> > > instead of succeeding then resulting in an NPE on the next access.
> > >
> >
> > I'd seen that the DFS seek was the best implementation, but hadn't seen
> the
> > cause. The other ones (especially the Buffered one that goes in front of
> > most others) is much weaker
> >
> >
> > > That's an incompatible change in terms of semantics, but not semantics
> > > intended by the author, or likely semantics programs depend on.
> > >
> >
> > That's a key problem: what do people depend on? A lot of the junit tests
> > depended on ordering of methods, after all
> >
> >
> > > However if a change made FileSystem#close three times slower, this
> > > perhaps a smaller semantic change (eg doesn't change what exceptions
> > > get thrown) but probably much less tolerable for end users.
> > >
> >
> > You know that the blobstores all buffer their data so that
> >
> >    1. flush() is a no-op
> >    2. the write takes place on close()
> >
> > #1 changes durability expectations, while #2 means the time to close() is
> > O(data)*O(latency); P(fail) scales with time and distance, and as lots of
> > code swallows exceptions on close, those failures may even miss.
> >
> > then there's the assumption that rename is atomic, which MapReduce
> depends
> > on.
> >
> >
> > >
> > > In any case, even if we get an 80% solution to the semantics issue
> > > we'll probably be in good shape for v2 GA if we can sort out the
> > > remaining topics.   See any other topics missing?   Once the overall
> > > outline is in shape it make sense to annotate the page with the
> > > current policy (if there's already consensus on one), and identifying
> > > areas where we need to come up with a policy or are leaving TBD.
> > > Currently this is a source of confusion for new developers, some
> > > downstream projects and users.
> > >
> > >
> > How about
> >
> > "semantic compatibility" : we strive to ensure that the behavior of APIs
> > remains consistent over versions, though changes for correctness may
> result
> > in changes in behavior  That is: if you relied on something which we
> > consider to be a bug, it may get fixed.
> >
> > We are in the process of specifying some APIs more rigorously, enhancing
> > our test suites to verify compliance with the specification, effectively
> > creating a formal specification for the subset of behaviors that can be
> > easily tested. We welcome involvement in this process, from both users
> and
> > implementors of our APIs.
> >
> > If you are concerned about compatibility at any level, we strongly
> > encourage you follow the Hadoop developer mailing lists, and track on
> JIRA
> > issues that may concern you. You are also strongly advised to verify that
> > your code works against beta releases of forthcoming Hadoop versions, as
> > that is a time in which identified regressions can be corrected rapidly
> -if
> > you only test when a new final release ships, the time to fix is likely
> to
> > be at least three months. "
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
Alejandro

Re: Compatibility in Apache Hadoop

Reply via email to