On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy <a...@hortonworks.com> wrote:
>
> On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote:
>
>> On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>>
>>> With that in mind, I really want to make a serious push to lock down APIs 
>>> and wire-protocols for hadoop-2.0.5-beta.
>>> Thus, we can confidently support hadoop-2.x in a compatible manner in the 
>>> future. So, it's fine to add new features,
>>> but please ensure that all APIs are frozen for hadoop-2.0.5-beta
>>
>> Arun, since it sounds like you have a pretty definite idea
>> in mind for what you want 'beta' label to actually mean,
>> could you, please, share the exact criteria?
>
> Sorry, I'm not sure if this is exactly what you are looking for but, as I 
> mentioned above, the primary aim would be make the final set of required 
> API/write-protocol changes so that we can call it a 'beta' i.e. once 
> 2.0.5-beta ships users & downstream projects can be confident about forward 
> compatibility in hadoop-2.x line. Obviously, we might discover a blocker bug 
> post 2.0.5 which *might* necessitate an unfortunate change - but that should 
> be an outstanding exception.

Arun, Suresh,

Mind reviewing the following page Karthik put together on
compatibility?   http://wiki.apache.org/hadoop/Compatibility

I think we should do something similar to what Sanjay proposed in
HADOOP-5071 for Hadoop v2.   If we get on the same page on
compatibility terms/APIs then we can quickly draft the policy, at
least for the things we've already got consensus on.  I think our new
developers, users, downstream projects, and partners would really
appreciate us making this clear.  If people like the content we can
move it to the Hadoop website and maintain it in svn like the bylaws.

The reason I think we need to do so is because there's been confusion
about what types of compatibility we promise and some open questions
which I'm not sure everyone is clear on. Examples:
- Are we going to preserve Hadoop v3 clients against v2 servers now
that we have protobuf support?  (I think so..)
- Can we break rolling upgrade of daemons in updates post GA? (I don't
think so..)
- Do we disallow HDFS metadata changes that require an HDFS upgrade in
an update? (I think so..)
- Can we remove methods from v2 and v2 updates that were deprecated in
v0.20-22?  (Unclear)
- Will we preserve binary compatibility for MR2 going forward? (I think so..)
- Does the ability to support multiple versions of MR simultaneously
via MR2 change the MR API compatibility story? (I don't think so..)
- Are the RM protocols sufficiently stable to disallow incompatible
changes potentially required by non-MR projects? (Unclear, most large
Yarn deployments I'm aware of are running 0.23, not v2 alphas)

I'm also not sure there's currently consensus on what an incompatible
change is. For example, I think HADOOP-9151 is incompatible because it
broke client/server wire compatibility with previous releases and any
change that breaks wire compatibility is incompatible.  Suresh felt it
was not an incompatible change because it did not affect API
compatibility (ie PB is not considered part of the API) and the change
occurred while v2 is in alpha.  Not sure we need to go through the
whole exercise of what's allowed in an alpha and beta (water under the
bridge, hopefully), but I do think we should clearly define an
incompatible change.  It's fine that v2 has been a bit wild wild west
in the alpha development stage but I think we need to get a little
more rigorous.

Thanks,
Eli

Reply via email to