Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-21 Thread Steve Loughran

On 17/11/11 19:31, Roman Shaposhnik wrote:

On Thu, Nov 17, 2011 at 11:09 AM, Arun C Murthy  wrote:

I don't know which are the ones in 'every single downstream component' - care 
to enumerate?

The ones I'm aware of, which have since been fixed are:
https://issues.apache.org/jira/browse/HBASE-4510 ->  
https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS apis 
so that HBase can continue to use them)
https://issues.apache.org/jira/browse/PIG-2125 ->  
https://issues.apache.org/jira/browse/MAPREDUCE-3138 (
we fixed MR to allow apps deal with inconsistency in 'new' MR apis which 
changed in 0.21).

I'm not aware of anything else - what else do you see?

In summary, please take a careful look at the 'factual information' before you 
decide to proclaim your beliefs
about important aspects such as 'incompatibility' - it's key to ensure we don't 
confuse end-users and have a smooth adoption of newer releases.


First of all, I would appreciate if you refrain from statements that
sound like you're lecturing me on public mailing list.

Here's what I said. Let me spell it out once again:

"I believe that by now we have enough factual evidence that at least
framework-level APIs are incompatible."

Here's the umbrella Bigtop JIRA that tracks those incompatibilities:
https://issues.apache.org/jira/browse/BIGTOP-162

Nowhere in my email I implied that I *know* of  cases where user-level
APIs would break. That said,
without a formal verification of backwards compatibility I can NOT
make the inverse statement as well.


I think we went though that discussion of formality a while back; the 
conclusion being without a formal specification of semantics.


What tends to burn code is the implicit expectations of behaviour -stuff 
that was never stated to be true, but which turned out to be so for a while.


I view all these bugreps as a sign of Bigtop's value to the release 
process -from that point of view: well done.


One thing I would like to see for build and test is the 0.23+ JARs in 
the public mvn repository, including the test ones (without any log4j 
files), so that downstream projects can work with them easily. I can see 
the 0.23.0 SNAPSHOT artifacts in the apache repository, but given the 
way M2 handles such things, I'd rather stable versioned releases


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Andrew Purtell
Hi Mahadev,

> From: Mahadev Konar 

> 
> Andrew,
> Can you open a jira listing the issues? Would be good to resolve them
> in the next 0.23 release.
> 
[...] 
>>  In addition, MAPREDUCE-3169. I had to pull out a bunch of HBase unit tests 
>> today to get 0.92 compiling on 0.23.


I opened HBASE-4813 and linked it to MAPREDUCE-3169.

Best regards,

   - Andy



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Mahadev Konar
Andrew,
 Can you open a jira listing the issues? Would be good to resolve them
in the next 0.23 release.

thanks
mahadev

On Thu, Nov 17, 2011 at 1:07 PM, Andrew Purtell  wrote:
>> From: Arun C Murthy 
>
>> Now, a downstream project such as HBase, Hive or Pig isn't the 'normal
>> end-user application'. These projects can choose to use
>> undocumented/non-public (e.g. LimitedPrivate) apis and we are committed to
>> working with them to ensure a smooth transition.
>>
>> I don't know which are the ones in 'every single downstream
>> component' - care to enumerate?
>>
>> The ones I'm aware of, which have since been fixed are:
>> https://issues.apache.org/jira/browse/HBASE-4510 ->
>> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS 
>> apis
>> so that HBase can continue to use them)
>> https://issues.apache.org/jira/browse/PIG-2125 ->
>> https://issues.apache.org/jira/browse/MAPREDUCE-3138 (we fixed MR to allow 
>> apps
>> deal with inconsistency in 'new' MR apis which changed in 0.21).
>>
>> I'm not aware of anything else - what else do you see?
>>
>> As a result, the downstream projects ensure their own end-users and 
>> applications
>> (HBase apps, Pig scripts, Hive queries) etc. do NOT see any 
>> incompatibilities.
>
>
> In addition, MAPREDUCE-3169. I had to pull out a bunch of HBase unit tests 
> today to get 0.92 compiling on 0.23.
>
>    - Andy
>
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Andrew Purtell
> From: Arun C Murthy 

> Now, a downstream project such as HBase, Hive or Pig isn't the 'normal 
> end-user application'. These projects can choose to use 
> undocumented/non-public (e.g. LimitedPrivate) apis and we are committed to 
> working with them to ensure a smooth transition.
> 
> I don't know which are the ones in 'every single downstream 
> component' - care to enumerate?
> 
> The ones I'm aware of, which have since been fixed are:
> https://issues.apache.org/jira/browse/HBASE-4510 -> 
> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS 
> apis 
> so that HBase can continue to use them)
> https://issues.apache.org/jira/browse/PIG-2125 -> 
> https://issues.apache.org/jira/browse/MAPREDUCE-3138 (we fixed MR to allow 
> apps 
> deal with inconsistency in 'new' MR apis which changed in 0.21).
> 
> I'm not aware of anything else - what else do you see?
> 
> As a result, the downstream projects ensure their own end-users and 
> applications 
> (HBase apps, Pig scripts, Hive queries) etc. do NOT see any incompatibilities.


In addition, MAPREDUCE-3169. I had to pull out a bunch of HBase unit tests 
today to get 0.92 compiling on 0.23.

   - Andy



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Roman Shaposhnik
On Thu, Nov 17, 2011 at 11:09 AM, Arun C Murthy  wrote:
> I don't know which are the ones in 'every single downstream component' - care 
> to enumerate?
>
> The ones I'm aware of, which have since been fixed are:
> https://issues.apache.org/jira/browse/HBASE-4510 -> 
> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS 
> apis so that HBase can continue to use them)
> https://issues.apache.org/jira/browse/PIG-2125 -> 
> https://issues.apache.org/jira/browse/MAPREDUCE-3138 (
> we fixed MR to allow apps deal with inconsistency in 'new' MR apis which 
> changed in 0.21).
>
> I'm not aware of anything else - what else do you see?
>
> In summary, please take a careful look at the 'factual information' before 
> you decide to proclaim your beliefs
> about important aspects such as 'incompatibility' - it's key to ensure we 
> don't confuse end-users and have a smooth adoption of newer releases.

First of all, I would appreciate if you refrain from statements that
sound like you're lecturing me on public mailing list.

Here's what I said. Let me spell it out once again:

"I believe that by now we have enough factual evidence that at least
framework-level APIs are incompatible."

Here's the umbrella Bigtop JIRA that tracks those incompatibilities:
   https://issues.apache.org/jira/browse/BIGTOP-162

Nowhere in my email I implied that I *know* of  cases where user-level
APIs would break. That said,
without a formal verification of backwards compatibility I can NOT
make the inverse statement as well.
That's why I used "at least" which according to a dictionary has
connotations of "according to the lowest
possible  assessment; not less than".

Once again, I'm in no position to asses the level of API compatibility
of user-level APIs. That's where
your expertise comes in. However, please do NOT tell me that I'm not
in a position to make a statement
about framework-level APIs where I spend a significant amount of time
(together with Tom and Alejandro)
patching every single downstream component (except HBase -- I didn't
patch it myself) to make it at least
compile against .23.

Hope this helps to cut down on confusion.

Thanks,
Roman.


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Alejandro Abdelnur
On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran  wrote:
> ...
> What I will miss in 0.23 is the MiniMRCluster, which I consider to be part
> of the API. Certainly its why I pull in hadoop-common-test-0.20.20x.jar into
> downstream builds, because it is the simplest way to do basic tests in junit
> of MR operations. It's also the most lightweight way to do single-machine
> Hadoop runs over small datasets.

https://issues.apache.org/jira/browse/MAPREDUCE-3169


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Arun C Murthy
Roman,

On Nov 17, 2011, at 8:33 AM, Roman Shaposhnik wrote:

> On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran  wrote:
>> -0.23 is a superset of the MR and HDFS APIs compatible with previous
>> versions (I don't know or care whether or not it is a proper superset or
>> not). The goal here is that end user apps and higher levels in the stack
>> (in-ASF and out-ASF) should work, though testing is required to verify this.
> 
> I believe that by now we have enough factual evidence that at least
> framework-level
> APIs are incompatible. 

Let me clarify to help you understand the distinction.

Both HDFS and MR have 'framework' apis (such as details of NN/DN and JT/TT) and 
'end user' apis (such as open/read/write/close or 
Mapper/Reducer/InputFormat/OutputFormat etc., more here: 
http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html).

hadoop-0.23 aims to be 'compatible' for end-users so that they don't need to 
modify their applications to use the new release. Also, we have both the 'old 
MR apis' and the 'new Context Objects MR apis' in 0.23.

The 'framework' apis are a different ballgame since the underlying framework, 
particularly in MR, has changed significantly. We have replaced the old JT/TT 
based 'classic' framework with the new 'yarn' framework consisting of 
ResourceManager/NodeManager. There are similar, but more subtle changes in the 
NameNode/DataNode for HDFS - then there is the append rewrite. As a result, the 
wire-protocols have changed significantly - as a result we are bumping up the 
'major version' to reflect that.

The crux of the matter: end-user applications do NOT need to be _modified_, 
they just have to be recompiled against the new libraries.

If you do see any reason for applications to be modified please open a jira and 
we'll ensure we get it fixed asap. 

Have you seen any such instance?

> That's exactly why every single downstream component
> needs to be patched at the level of code to work against 0.23.


Now, a downstream project such as HBase, Hive or Pig isn't the 'normal end-user 
application'. These projects can choose to use undocumented/non-public (e.g. 
LimitedPrivate) apis and we are committed to working with them to ensure a 
smooth transition.

I don't know which are the ones in 'every single downstream component' - care 
to enumerate?

The ones I'm aware of, which have since been fixed are:
https://issues.apache.org/jira/browse/HBASE-4510 -> 
https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS 
apis so that HBase can continue to use them)
https://issues.apache.org/jira/browse/PIG-2125 -> 
https://issues.apache.org/jira/browse/MAPREDUCE-3138 (we fixed MR to allow apps 
deal with inconsistency in 'new' MR apis which changed in 0.21).

I'm not aware of anything else - what else do you see?

As a result, the downstream projects ensure their own end-users and 
applications (HBase apps, Pig scripts, Hive queries) etc. do NOT see any 
incompatibilities.



In summary, please take a careful look at the 'factual information' before you 
decide to proclaim your beliefs about important aspects such as 
'incompatibility' - it's key to ensure we don't confuse end-users and have a 
smooth adoption of newer releases.

thanks,
Arun



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Roman Shaposhnik
On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran  wrote:
> -0.23 is a superset of the MR and HDFS APIs compatible with previous
> versions (I don't know or care whether or not it is a proper superset or
> not). The goal here is that end user apps and higher levels in the stack
> (in-ASF and out-ASF) should work, though testing is required to verify this.

I believe that by now we have enough factual evidence that at least
framework-level
APIs are incompatible. That's exactly why every single downstream component
needs to be patched at the level of code to work against 0.23.

Thanks,
Roman.


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-17 Thread Steve Loughran

On 17/11/11 02:06, Scott Carey wrote:



On 11/16/11 3:51 PM, "Nathan Roberts"  wrote:


On 11/16/11 4:43 PM, "Arun C Murthy"  wrote:

I propose we adopt the convention that a new major version should be a
superset of the previous major version, features-wise.

Just so I'm clear. This is only guaranteed at the time the new major
version is started. A day later a previous major line may merge a feature
from trunk and then it's no longer the case that 2.x.y is a superset. If
that's the case I'm not sure of the value of the convention. We could say
that new major versions always start from trunk, but that doesn't have
meaning outside of the developer community.


I don't think in general one can say that major versions are a superset of
previous major versions.  Then you would need to have a SuperMajor version
number for the (rare) times that this was broken.
In other words, the major version number really can't have any
restrictions.
Perhaps however, one can say that minor versions are supersets of prior
minor version if one were to define 'superset'.

Its going to be hard to claim that the 0.23 branch is a superset of 0.22
-- After all, there is no JobTracker and all sorts of stuff has been
removed or replaced with something else.  Whether that defines a superset
or not gets into a lot of semantics of what we mean by 'superset'.



Perhaps like 'feature' or 'bug fix', it is best not to get into the
semantics of defining what we mean by 'superset' and rather define version
number meaning only in terms of compatibility classifications.  Especially
since the compatibility classification has implications for all of these
other things  -- and IMO more clearly useful ones.  For example, consider
that a "bug fix" may break wire compatibility, that a tiny harmless change
can be considered a "new feature", or that replacing a single link in a UI
could be considered breaking a "superset" rule.





I think it would be good to distinguish user-API supersets/subsets with 
internal superset/subsets


-0.23 is a superset of the MR and HDFS APIs compatible with previous 
versions (I don't know or care whether or not it is a proper superset or 
not). The goal here is that end user apps and higher levels in the stack 
(in-ASF and out-ASF) should work, though testing is required to verify 
this.


A failure of the layers above to work with 0.23+ is something that 
should be considered a regression, looked at and then either dismissed 
as "you weren't meant to do that" or triggers a fix.


-0.23 has changed the back end means by which jobs are scheduled; the 
monitoring APIs have changed, etc, etc. Where people will see a visible 
difference is in the JT Web UI. That's not an API-level change


A failure of any code that goes into this bit of the system to compile 
or run against 0.23 is something people can feel slightly sorry about, 
but not enough to trigger reversions.


What I will miss in 0.23 is the MiniMRCluster, which I consider to be 
part of the API. Certainly its why I pull in 
hadoop-common-test-0.20.20x.jar into downstream builds, because it is 
the simplest way to do basic tests in junit of MR operations. It's also 
the most lightweight way to do single-machine Hadoop runs over small 
datasets.


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Konstantin Boudnik
On Wed, Nov 16, 2011 at 01:11PM, Matt Foley wrote:
> I support giving all three active code branches a clean start, on an equal
> footing:
> 
> - The next release of 0.20-security (formerly expected as "0.20.205.1") to
> be 1.0.0, establishing branch-1.0
> - The next release of 0.22 to be 2.0.0, establishing branch-2.0
> - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0,
> from which the formerly expected "0.23.1" may be released as 3.0.1
> - All three code branches to obey the established major.minor.patch
> versioning rules going forward.

+1 on all three

Cos

> - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the
> then release manager, and the pleasure of the community.
> 
> Regards,
> --Matt
> 
> On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting  wrote:
> 
> > On 11/16/2011 10:15 AM, Scott Carey wrote:
> > > IMO what is important from the development and maintenance perspective is
> > > the _meaning_ of the
> > > major.minor.patch numbers as described in my previous message.
> > >
> > > If a minor version number bump means that it is a superset of the
> > previous
> > > release and is backwards compatible, then that requirement on its own
> > > answers whether 0.22 can become 1.1, or if it must be a 2.0 release.
> > >
> > > Whether hadoop starts using a new meaning for major.minor.patch is what
> > is
> > > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing.
> >
> > Scott, this is a great point.  Thanks for making it.
> >
> > > The version number is completely meaningless on its own, pure marketing.
> > > However, if the numbers gain meaning through a clear definition of what
> > > the major.minor.patch numbers signify, then there is meaning and
> > structure
> > > going forward.
> > > The current state of affairs seems to be:
> > > major:  always 0
> > > minor:  potentially big changes; almost always breaks wire compatibility;
> > > occasionally breaks API backwards compatibility
> > > minor:  typically bug fixes only; 'bug fix' not well defined; almost
> > never
> > > breaks API or wire compatibility
> >
> > Long ago I proposed such rules for Hadoop releases at:
> >
> > http://wiki.apache.org/hadoop/Roadmap
> >
> > These state that pre-1.0 releases behave roughly as above.
> >
> > > I think the community can decide two things independently:
> > >
> > > - Should 0.20.20x be renamed 1.0.y ?  (perhaps not, perhaps 0.23 should
> > be
> > > 1.0 and the others left alone).
> > > - Should hadoop adopt a new clear definition of major.minor.patch number
> > > significance?
> >
> > Would you care to call a vote on one or both of these?
> >
> > > example proposal:
> > > * major version number increment: signifies breaks in API backwards
> > > compatibility and/or major architecture overhauls.
> > > * minor version number increment: signifies possible API changes, but
> > > maintains API backwards compatibility.  Wire compatibility may break (see
> > > release notes).  Included functionality is a superset of previous minor
> > > release.
> > > * patch version number increment: signifies a release where all
> > > improvements are fully backwards compatible with the previous patch
> > > version, including wire format.
> >
> > This is also similar to what the Roadmap wiki page indicates for
> > post-1.0 releases.
> >
> > Renaming things after the fact to try to make them consistent when the
> > prior rules weren't consistently followed is not easy.  Instead we might
> > better focus on rules that we intend to obey for releases going forward
> > and then obey them.
> >
> > > Whatever the meaning of the numbers turns out to be will dictate whether
> > > releases after a 1.0.x need to be 2.0.x or can be 1.1.x
> >
> > Good point.  The most accurate approach would probably be to call each
> > existing branch a distinct major release.  Dropping the leading zero
> > would reduce confusion and avoid marketing but would still combine
> > 0.20.x and 0.20.20x which perhaps ought to be considered separate major
> > releases.  For me this is however a reasonable tradeoff since we're
> > better off focusing on improving things in the future than arguing about
> > marketing and how to hide our past versioning mistakes.
> >
> > Doug
> >


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Scott Carey


On 11/16/11 3:51 PM, "Nathan Roberts"  wrote:

>On 11/16/11 4:43 PM, "Arun C Murthy"  wrote:
>> I propose we adopt the convention that a new major version should be a
>>superset of the previous major version, features-wise.
>Just so I'm clear. This is only guaranteed at the time the new major
>version is started. A day later a previous major line may merge a feature
>from trunk and then it's no longer the case that 2.x.y is a superset. If
>that's the case I'm not sure of the value of the convention. We could say
>that new major versions always start from trunk, but that doesn't have
>meaning outside of the developer community.

I don't think in general one can say that major versions are a superset of
previous major versions.  Then you would need to have a SuperMajor version
number for the (rare) times that this was broken.
In other words, the major version number really can't have any
restrictions.
Perhaps however, one can say that minor versions are supersets of prior
minor version if one were to define 'superset'.

Its going to be hard to claim that the 0.23 branch is a superset of 0.22
-- After all, there is no JobTracker and all sorts of stuff has been
removed or replaced with something else.  Whether that defines a superset
or not gets into a lot of semantics of what we mean by 'superset'.

Perhaps like 'feature' or 'bug fix', it is best not to get into the
semantics of defining what we mean by 'superset' and rather define version
number meaning only in terms of compatibility classifications.  Especially
since the compatibility classification has implications for all of these
other things  -- and IMO more clearly useful ones.  For example, consider
that a "bug fix" may break wire compatibility, that a tiny harmless change
can be considered a "new feature", or that replacing a single link in a UI
could be considered breaking a "superset" rule.

>
>On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote:
>>
>> Another definition is that a major release permits incompatible changes,
>> either in APIs, wire-formats, on-disk formats, etc.
>Are our wire formats stable enough in all release lines that we're ready
>to live by this? It would mean the 1.x.y line could not change a wire
>format without a bump to the major number, which would obviously cause
>issues. Even in the 23.x line I thought there were still some wire
>compatibility changes pending which would mean we'd quickly be moving to
>a 4.x line.
>
>Nathan
>



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Scott Carey


On 11/16/11 4:13 PM, "Doug Cutting"  wrote:

>On 11/16/2011 03:51 PM, Nathan Roberts wrote:
>>> > Another definition is that a major release permits incompatible
>>>changes,
>>> > either in APIs, wire-formats, on-disk formats, etc.
>> Are our wire formats stable enough in all release lines that we're
>>ready to live by this?
>
>No.  Long-term we'd like to only break wire-compatibility in major
>releases, and ideally not even then.  So the set of things that have to
>be compatible within minor releases does not currently include wire
>formats, but we hope eventually will.

Currently, it can probably be enforced that patch versions don't change
wire format, and that minor versions don't break API.  Major versions may
break both.

At a later time, it may be possible to enforce that minor version bumps be
wire compatible.  At that time, the version rules going forward can change.


>
>Doug
>



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread sanjay Radia

On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote:
> 
> 
> Another definition is that a major release permits incompatible changes,
> either in APIs, wire-formats, on-disk formats, etc.  This is more
> objective measure.  For example, one might in release X+1 deprecate
> features of release X but still remain compatible with them, while in
> X+2 we'd remove them.  So every major release would make incompatible
> changes, but only of things that had been deprecated two releases ago.
> Often the reason for the incompatible changes is new primary APIs or
> re-implementation of primary components, but those more subjective
> measures would not be the justification for the major version, rather
> any incompatible changes would.

This is mostly consistent with what is stated in wrt to API changes  in 
HADOOP-5071  on "Hadoop Compatibility requirements" : 
https://issues.apache.org/jira/browse/HADOOP-5071.

HADOOP-5071 was derived from a long series of email discussions and describes 
some of the subtle nuances for compatibility for API, on-disk format, wire etc.
Some notes (see details there)
1) break in compatibility => major number change, 
  but major number change does NOT => break in compatibility.  
2) we routinely change on disk format on hdfs but do an automatic upgrade. That 
is okay and allowed without a major number change.
> 
> Of course, we should work hard to never make incompatible changes...

Agreed.  Once things are in customer hands it is very hard to remove even 
deprecated methods
(But once in a while we have to have to do it after sufficient time to upgrade 
to new APIs).
 Java for example does not remove deprecated methods.

sanjay



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Doug Cutting
On 11/16/2011 03:51 PM, Nathan Roberts wrote:
>> > Another definition is that a major release permits incompatible changes,
>> > either in APIs, wire-formats, on-disk formats, etc.
> Are our wire formats stable enough in all release lines that we're ready to 
> live by this?

No.  Long-term we'd like to only break wire-compatibility in major
releases, and ideally not even then.  So the set of things that have to
be compatible within minor releases does not currently include wire
formats, but we hope eventually will.

Doug



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Eric Yang
+1 on Matt's proposal.

On Wed, Nov 16, 2011 at 1:11 PM, Matt Foley  wrote:
> I support giving all three active code branches a clean start, on an equal
> footing:
>
> - The next release of 0.20-security (formerly expected as "0.20.205.1") to
> be 1.0.0, establishing branch-1.0
> - The next release of 0.22 to be 2.0.0, establishing branch-2.0
> - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0,
>    from which the formerly expected "0.23.1" may be released as 3.0.1
> - All three code branches to obey the established major.minor.patch
> versioning rules going forward.
> - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the
> then release manager, and the pleasure of the community.
>
> Regards,
> --Matt
>
> On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting  wrote:
>
>> On 11/16/2011 10:15 AM, Scott Carey wrote:
>> > IMO what is important from the development and maintenance perspective is
>> > the _meaning_ of the
>> > major.minor.patch numbers as described in my previous message.
>> >
>> > If a minor version number bump means that it is a superset of the
>> previous
>> > release and is backwards compatible, then that requirement on its own
>> > answers whether 0.22 can become 1.1, or if it must be a 2.0 release.
>> >
>> > Whether hadoop starts using a new meaning for major.minor.patch is what
>> is
>> > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing.
>>
>> Scott, this is a great point.  Thanks for making it.
>>
>> > The version number is completely meaningless on its own, pure marketing.
>> > However, if the numbers gain meaning through a clear definition of what
>> > the major.minor.patch numbers signify, then there is meaning and
>> structure
>> > going forward.
>> > The current state of affairs seems to be:
>> > major:  always 0
>> > minor:  potentially big changes; almost always breaks wire compatibility;
>> > occasionally breaks API backwards compatibility
>> > minor:  typically bug fixes only; 'bug fix' not well defined; almost
>> never
>> > breaks API or wire compatibility
>>
>> Long ago I proposed such rules for Hadoop releases at:
>>
>> http://wiki.apache.org/hadoop/Roadmap
>>
>> These state that pre-1.0 releases behave roughly as above.
>>
>> > I think the community can decide two things independently:
>> >
>> > - Should 0.20.20x be renamed 1.0.y ?  (perhaps not, perhaps 0.23 should
>> be
>> > 1.0 and the others left alone).
>> > - Should hadoop adopt a new clear definition of major.minor.patch number
>> > significance?
>>
>> Would you care to call a vote on one or both of these?
>>
>> > example proposal:
>> > * major version number increment: signifies breaks in API backwards
>> > compatibility and/or major architecture overhauls.
>> > * minor version number increment: signifies possible API changes, but
>> > maintains API backwards compatibility.  Wire compatibility may break (see
>> > release notes).  Included functionality is a superset of previous minor
>> > release.
>> > * patch version number increment: signifies a release where all
>> > improvements are fully backwards compatible with the previous patch
>> > version, including wire format.
>>
>> This is also similar to what the Roadmap wiki page indicates for
>> post-1.0 releases.
>>
>> Renaming things after the fact to try to make them consistent when the
>> prior rules weren't consistently followed is not easy.  Instead we might
>> better focus on rules that we intend to obey for releases going forward
>> and then obey them.
>>
>> > Whatever the meaning of the numbers turns out to be will dictate whether
>> > releases after a 1.0.x need to be 2.0.x or can be 1.1.x
>>
>> Good point.  The most accurate approach would probably be to call each
>> existing branch a distinct major release.  Dropping the leading zero
>> would reduce confusion and avoid marketing but would still combine
>> 0.20.x and 0.20.20x which perhaps ought to be considered separate major
>> releases.  For me this is however a reasonable tradeoff since we're
>> better off focusing on improving things in the future than arguing about
>> marketing and how to hide our past versioning mistakes.
>>
>> Doug
>>
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Arun C Murthy
How about this?

I have a vote running for 1.x at this point. We seem to agree about 
major/minor/patch version and need for compatibility.

Beyond that, all other releases (at this point), whether it's 0.22 (unreleased) 
or 0.23 (very alpha) are not worth debating endlessly.

Should we just revisit the versioning discussion when we are ready to release 
them and/or support them?

I'm happy to continue using 0.23.x for now - I'd rather spend time on fixing 
0.23.x than debating now. 

To me this seems like a very Apache thing to do, what matters is the code and 
the community - debates on versioning can come later when the bits are ready. 
No amount of labelling will either produce or stabilize the software.

Thoughts?

Arun

On Nov 16, 2011, at 1:11 PM, Matt Foley wrote:

> I support giving all three active code branches a clean start, on an equal
> footing:
> 
> - The next release of 0.20-security (formerly expected as "0.20.205.1") to
> be 1.0.0, establishing branch-1.0
> - The next release of 0.22 to be 2.0.0, establishing branch-2.0
> - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0,
>from which the formerly expected "0.23.1" may be released as 3.0.1
> - All three code branches to obey the established major.minor.patch
> versioning rules going forward.
> - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the
> then release manager, and the pleasure of the community.
> 
> Regards,
> --Matt
> 
> On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting  wrote:
> 
>> On 11/16/2011 10:15 AM, Scott Carey wrote:
>>> IMO what is important from the development and maintenance perspective is
>>> the _meaning_ of the
>>> major.minor.patch numbers as described in my previous message.
>>> 
>>> If a minor version number bump means that it is a superset of the
>> previous
>>> release and is backwards compatible, then that requirement on its own
>>> answers whether 0.22 can become 1.1, or if it must be a 2.0 release.
>>> 
>>> Whether hadoop starts using a new meaning for major.minor.patch is what
>> is
>>> of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing.
>> 
>> Scott, this is a great point.  Thanks for making it.
>> 
>>> The version number is completely meaningless on its own, pure marketing.
>>> However, if the numbers gain meaning through a clear definition of what
>>> the major.minor.patch numbers signify, then there is meaning and
>> structure
>>> going forward.
>>> The current state of affairs seems to be:
>>> major:  always 0
>>> minor:  potentially big changes; almost always breaks wire compatibility;
>>> occasionally breaks API backwards compatibility
>>> minor:  typically bug fixes only; 'bug fix' not well defined; almost
>> never
>>> breaks API or wire compatibility
>> 
>> Long ago I proposed such rules for Hadoop releases at:
>> 
>> http://wiki.apache.org/hadoop/Roadmap
>> 
>> These state that pre-1.0 releases behave roughly as above.
>> 
>>> I think the community can decide two things independently:
>>> 
>>> - Should 0.20.20x be renamed 1.0.y ?  (perhaps not, perhaps 0.23 should
>> be
>>> 1.0 and the others left alone).
>>> - Should hadoop adopt a new clear definition of major.minor.patch number
>>> significance?
>> 
>> Would you care to call a vote on one or both of these?
>> 
>>> example proposal:
>>> * major version number increment: signifies breaks in API backwards
>>> compatibility and/or major architecture overhauls.
>>> * minor version number increment: signifies possible API changes, but
>>> maintains API backwards compatibility.  Wire compatibility may break (see
>>> release notes).  Included functionality is a superset of previous minor
>>> release.
>>> * patch version number increment: signifies a release where all
>>> improvements are fully backwards compatible with the previous patch
>>> version, including wire format.
>> 
>> This is also similar to what the Roadmap wiki page indicates for
>> post-1.0 releases.
>> 
>> Renaming things after the fact to try to make them consistent when the
>> prior rules weren't consistently followed is not easy.  Instead we might
>> better focus on rules that we intend to obey for releases going forward
>> and then obey them.
>> 
>>> Whatever the meaning of the numbers turns out to be will dictate whether
>>> releases after a 1.0.x need to be 2.0.x or can be 1.1.x
>> 
>> Good point.  The most accurate approach would probably be to call each
>> existing branch a distinct major release.  Dropping the leading zero
>> would reduce confusion and avoid marketing but would still combine
>> 0.20.x and 0.20.20x which perhaps ought to be considered separate major
>> releases.  For me this is however a reasonable tradeoff since we're
>> better off focusing on improving things in the future than arguing about
>> marketing and how to hide our past versioning mistakes.
>> 
>> Doug
>> 



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Nathan Roberts
On 11/16/11 4:43 PM, "Arun C Murthy"  wrote:
> I propose we adopt the convention that a new major version should be a 
> superset of the previous major version, features-wise.
Just so I'm clear. This is only guaranteed at the time the new major version is 
started. A day later a previous major line may merge a feature from trunk and 
then it's no longer the case that 2.x.y is a superset. If that's the case I'm 
not sure of the value of the convention. We could say that new major versions 
always start from trunk, but that doesn't have meaning outside of the developer 
community.

On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote:
>
> Another definition is that a major release permits incompatible changes,
> either in APIs, wire-formats, on-disk formats, etc.
Are our wire formats stable enough in all release lines that we're ready to 
live by this? It would mean the 1.x.y line could not change a wire format 
without a bump to the major number, which would obviously cause issues. Even in 
the 23.x line I thought there were still some wire compatibility changes 
pending which would mean we'd quickly be moving to a 4.x line.

Nathan



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Andrew Purtell
> On Wed, Nov 16, 2011 at 1:11 PM, Matt Foley wrote:
>  I support giving all three active code branches a clean start, on an equal
>  footing:
> 
>  - The next release of 0.20-security (formerly expected as 
> "0.20.205.1") to be 1.0.0, establishing branch-1.0
>  - The next release of 0.22 to be 2.0.0, establishing branch-2.0
>  - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0,
> from which the formerly expected "0.23.1" may be released as 3.0.1
>  - All three code branches to obey the established major.minor.patch
>  versioning rules going forward.
>  - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the
>  then release manager, and the pleasure of the community.


+1 non binding

Clean and easily understood. After an initial round of "what just happened?",
it will be much easier explaining Hadoop evolution to users.

   - Andy



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Arun C Murthy

On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote:
> 
> Another definition is that a major release permits incompatible changes,
> either in APIs, wire-formats, on-disk formats, etc.  This is more
> objective measure.  For example, one might in release X+1 deprecate
> features of release X but still remain compatible with them, while in
> X+2 we'd remove them.  So every major release would make incompatible
> changes, but only of things that had been deprecated two releases ago.
> Often the reason for the incompatible changes is new primary APIs or
> re-implementation of primary components, but those more subjective
> measures would not be the justification for the major version, rather
> any incompatible changes would.

If I wasn't clear, I'd much rather prefer this objective measure.

+1

Arun


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Arun C Murthy
Agreed.

We will discard features as we go along, but we need to have consensus to 
discard major features. Is that fair?

And we discard them for reasons you outlined...

Arun

On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote:

> On 11/16/2011 02:43 PM, Arun C Murthy wrote:
>> I propose we adopt the convention that a new major version should be a 
>> superset of the previous major version, features-wise.
> 
> That means that we could never discard a feature, no?
> 
> One definition is that a major release includes some fundamental
> changes, e.g., new primary APIs or a re-implementation of primary
> components.  MR2 probably qualifies as both.  With a large system with
> many APIs and components this becomes a rather subjective measure, but I
> don't see an easy way around that.
> 
> Another definition is that a major release permits incompatible changes,
> either in APIs, wire-formats, on-disk formats, etc.  This is more
> objective measure.  For example, one might in release X+1 deprecate
> features of release X but still remain compatible with them, while in
> X+2 we'd remove them.  So every major release would make incompatible
> changes, but only of things that had been deprecated two releases ago.
> Often the reason for the incompatible changes is new primary APIs or
> re-implementation of primary components, but those more subjective
> measures would not be the justification for the major version, rather
> any incompatible changes would.
> 
> Of course, we should work hard to never make incompatible changes...
> 
> Doug



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Roman Shaposhnik
On Wed, Nov 16, 2011 at 1:11 PM, Matt Foley  wrote:
> I support giving all three active code branches a clean start, on an equal
> footing:
>
> - The next release of 0.20-security (formerly expected as "0.20.205.1") to
> be 1.0.0, establishing branch-1.0
> - The next release of 0.22 to be 2.0.0, establishing branch-2.0
> - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0,
>    from which the formerly expected "0.23.1" may be released as 3.0.1
> - All three code branches to obey the established major.minor.patch
> versioning rules going forward.
> - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the
> then release manager, and the pleasure of the community.

+1 on all the points above. This is by far the most reasonable
proposal I've seen on this thread.

Thanks,
Roman.


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Doug Cutting
On 11/16/2011 02:43 PM, Arun C Murthy wrote:
> I propose we adopt the convention that a new major version should be a 
> superset of the previous major version, features-wise.

That means that we could never discard a feature, no?

One definition is that a major release includes some fundamental
changes, e.g., new primary APIs or a re-implementation of primary
components.  MR2 probably qualifies as both.  With a large system with
many APIs and components this becomes a rather subjective measure, but I
don't see an easy way around that.

Another definition is that a major release permits incompatible changes,
either in APIs, wire-formats, on-disk formats, etc.  This is more
objective measure.  For example, one might in release X+1 deprecate
features of release X but still remain compatible with them, while in
X+2 we'd remove them.  So every major release would make incompatible
changes, but only of things that had been deprecated two releases ago.
Often the reason for the incompatible changes is new primary APIs or
re-implementation of primary components, but those more subjective
measures would not be the justification for the major version, rather
any incompatible changes would.

Of course, we should work hard to never make incompatible changes...

Doug


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Arun C Murthy

On Nov 16, 2011, at 11:57 AM, Doug Cutting wrote:

> On 11/16/2011 10:15 AM, Scott Carey wrote:
>> - Should hadoop adopt a new clear definition of major.minor.patch number
>> significance?
> 
> Would you care to call a vote on one or both of these?

Great points Scott and Doug. I agree about the need for clarity on 
major/minor/patch significance.

I'll start a vote and update the Roadmap with the results.

Also, along with, we need a clear idea about what it means for a major version 
bump.

I propose we adopt the convention that a new major version should be a superset 
of the previous major version, features-wise.

Does that sound reasonable?

thanks,
Arun

Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Joe Stein
+1 to Owen's slight modification and to Matt's proposal with a minor (no
pun intended) suggestion

branch-0.20-security -> branch-1.0
branch-0.20-security-205 -> branch-1.1.0

On Wed, Nov 16, 2011 at 4:37 PM, Owen O'Malley  wrote:

> +1 to Matt's proposal, although I'd modify it slightly to say that:
>
> branch-0.20-security -> branch-1
> branch-0.20-security-205 -> branch-1.0
>
> -- Owen
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Owen O'Malley
+1 to Matt's proposal, although I'd modify it slightly to say that:

branch-0.20-security -> branch-1
branch-0.20-security-205 -> branch-1.0

-- Owen


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Matt Foley
I support giving all three active code branches a clean start, on an equal
footing:

- The next release of 0.20-security (formerly expected as "0.20.205.1") to
be 1.0.0, establishing branch-1.0
- The next release of 0.22 to be 2.0.0, establishing branch-2.0
- The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0,
from which the formerly expected "0.23.1" may be released as 3.0.1
- All three code branches to obey the established major.minor.patch
versioning rules going forward.
- So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the
then release manager, and the pleasure of the community.

Regards,
--Matt

On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting  wrote:

> On 11/16/2011 10:15 AM, Scott Carey wrote:
> > IMO what is important from the development and maintenance perspective is
> > the _meaning_ of the
> > major.minor.patch numbers as described in my previous message.
> >
> > If a minor version number bump means that it is a superset of the
> previous
> > release and is backwards compatible, then that requirement on its own
> > answers whether 0.22 can become 1.1, or if it must be a 2.0 release.
> >
> > Whether hadoop starts using a new meaning for major.minor.patch is what
> is
> > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing.
>
> Scott, this is a great point.  Thanks for making it.
>
> > The version number is completely meaningless on its own, pure marketing.
> > However, if the numbers gain meaning through a clear definition of what
> > the major.minor.patch numbers signify, then there is meaning and
> structure
> > going forward.
> > The current state of affairs seems to be:
> > major:  always 0
> > minor:  potentially big changes; almost always breaks wire compatibility;
> > occasionally breaks API backwards compatibility
> > minor:  typically bug fixes only; 'bug fix' not well defined; almost
> never
> > breaks API or wire compatibility
>
> Long ago I proposed such rules for Hadoop releases at:
>
> http://wiki.apache.org/hadoop/Roadmap
>
> These state that pre-1.0 releases behave roughly as above.
>
> > I think the community can decide two things independently:
> >
> > - Should 0.20.20x be renamed 1.0.y ?  (perhaps not, perhaps 0.23 should
> be
> > 1.0 and the others left alone).
> > - Should hadoop adopt a new clear definition of major.minor.patch number
> > significance?
>
> Would you care to call a vote on one or both of these?
>
> > example proposal:
> > * major version number increment: signifies breaks in API backwards
> > compatibility and/or major architecture overhauls.
> > * minor version number increment: signifies possible API changes, but
> > maintains API backwards compatibility.  Wire compatibility may break (see
> > release notes).  Included functionality is a superset of previous minor
> > release.
> > * patch version number increment: signifies a release where all
> > improvements are fully backwards compatible with the previous patch
> > version, including wire format.
>
> This is also similar to what the Roadmap wiki page indicates for
> post-1.0 releases.
>
> Renaming things after the fact to try to make them consistent when the
> prior rules weren't consistently followed is not easy.  Instead we might
> better focus on rules that we intend to obey for releases going forward
> and then obey them.
>
> > Whatever the meaning of the numbers turns out to be will dictate whether
> > releases after a 1.0.x need to be 2.0.x or can be 1.1.x
>
> Good point.  The most accurate approach would probably be to call each
> existing branch a distinct major release.  Dropping the leading zero
> would reduce confusion and avoid marketing but would still combine
> 0.20.x and 0.20.20x which perhaps ought to be considered separate major
> releases.  For me this is however a reasonable tradeoff since we're
> better off focusing on improving things in the future than arguing about
> marketing and how to hide our past versioning mistakes.
>
> Doug
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Doug Cutting
On 11/16/2011 10:15 AM, Scott Carey wrote:
> IMO what is important from the development and maintenance perspective is
> the _meaning_ of the
> major.minor.patch numbers as described in my previous message.
> 
> If a minor version number bump means that it is a superset of the previous
> release and is backwards compatible, then that requirement on its own
> answers whether 0.22 can become 1.1, or if it must be a 2.0 release.
> 
> Whether hadoop starts using a new meaning for major.minor.patch is what is
> of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing.

Scott, this is a great point.  Thanks for making it.

> The version number is completely meaningless on its own, pure marketing.
> However, if the numbers gain meaning through a clear definition of what
> the major.minor.patch numbers signify, then there is meaning and structure
> going forward.
> The current state of affairs seems to be:
> major:  always 0
> minor:  potentially big changes; almost always breaks wire compatibility;
> occasionally breaks API backwards compatibility
> minor:  typically bug fixes only; 'bug fix' not well defined; almost never
> breaks API or wire compatibility

Long ago I proposed such rules for Hadoop releases at:

http://wiki.apache.org/hadoop/Roadmap

These state that pre-1.0 releases behave roughly as above.

> I think the community can decide two things independently:
> 
> - Should 0.20.20x be renamed 1.0.y ?  (perhaps not, perhaps 0.23 should be
> 1.0 and the others left alone).
> - Should hadoop adopt a new clear definition of major.minor.patch number
> significance?

Would you care to call a vote on one or both of these?

> example proposal:
> * major version number increment: signifies breaks in API backwards
> compatibility and/or major architecture overhauls.
> * minor version number increment: signifies possible API changes, but
> maintains API backwards compatibility.  Wire compatibility may break (see
> release notes).  Included functionality is a superset of previous minor
> release.
> * patch version number increment: signifies a release where all
> improvements are fully backwards compatible with the previous patch
> version, including wire format.

This is also similar to what the Roadmap wiki page indicates for
post-1.0 releases.

Renaming things after the fact to try to make them consistent when the
prior rules weren't consistently followed is not easy.  Instead we might
better focus on rules that we intend to obey for releases going forward
and then obey them.

> Whatever the meaning of the numbers turns out to be will dictate whether
> releases after a 1.0.x need to be 2.0.x or can be 1.1.x

Good point.  The most accurate approach would probably be to call each
existing branch a distinct major release.  Dropping the leading zero
would reduce confusion and avoid marketing but would still combine
0.20.x and 0.20.20x which perhaps ought to be considered separate major
releases.  For me this is however a reasonable tradeoff since we're
better off focusing on improving things in the future than arguing about
marketing and how to hide our past versioning mistakes.

Doug


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Scott Carey


On 11/16/11 9:24 AM, "Konstantin Boudnik"  wrote:

>On Wed, Nov 16, 2011 at 09:15AM, Doug Cutting wrote:
>> On 11/15/2011 06:06 PM, Konstantin Boudnik wrote:
>> > Are you suggesting to drop 0.22 out of the picture all together? Any
>> > reason for that?
>> 
>> By no means.  I thought that we might, as Scott Carey said, treat 0.22
>> as a minor release in the 1.x series.  I'd prefer that we consistently
>> rename branches (0.20.x becomes 1.0.x, 0.21.x becomes 1.1.x, etc.).
>
>Thanks for the explanation. I see your point in 1.?.x renames. My only
>concern
>is that it might suggest that to the users that 1.2.0 (e.g. current 0.22)
>is a
>sort of natural continuation from 1.0.0 (current 0.20.x) and the upgrade
>would
>be easy and automatic. Which isn't necessary the case, IMO.

IMO what is important from the development and maintenance perspective is
the _meaning_ of the
major.minor.patch numbers as described in my previous message.

If a minor version number bump means that it is a superset of the previous
release and is backwards compatible, then that requirement on its own
answers whether 0.22 can become 1.1, or if it must be a 2.0 release.

Whether hadoop starts using a new meaning for major.minor.patch is what is
of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing.

The version number is completely meaningless on its own, pure marketing.
However, if the numbers gain meaning through a clear definition of what
the major.minor.patch numbers signify, then there is meaning and structure
going forward.
The current state of affairs seems to be:
major:  always 0
minor:  potentially big changes; almost always breaks wire compatibility;
occasionally breaks API backwards compatibility
minor:  typically bug fixes only; 'bug fix' not well defined; almost never
breaks API or wire compatibility

I think the community can decide two things independently:

- Should 0.20.20x be renamed 1.0.y ?  (perhaps not, perhaps 0.23 should be
1.0 and the others left alone).
- Should hadoop adopt a new clear definition of major.minor.patch number
significance?

example proposal:
* major version number increment: signifies breaks in API backwards
compatibility and/or major architecture overhauls.
* minor version number increment: signifies possible API changes, but
maintains API backwards compatibility.  Wire compatibility may break (see
release notes).  Included functionality is a superset of previous minor
release.
* patch version number increment: signifies a release where all
improvements are fully backwards compatible with the previous patch
version, including wire format.

Any release may contain new features or improvements, provided they don't
break the compatibility rules and the release manager approves of the
inclusion.  It is not worth defining whether a change is a 'bug fix' 'new
feature' or 'improvement' and dictating any rules based on that -- these
can often blur together and can be dealt with on a case by case basis
instead of through version rules.  IMO guiding the meaning of version
numbers by compatibility class makes the most sense.


Whatever the meaning of the numbers turns out to be will dictate whether
releases after a 1.0.x need to be 2.0.x or can be 1.1.x

>Separating them in two major versions won't be sending such a message.
>
>> We're rapidly falling into the trap of putting too much significance in
>> a version number, seeking some sort of marketing boost by declaring 1.0.
>>  We can sidestep this by simply dropping the leading 0. and henceforth
>> referring to things as 20, 21, 22, etc.  This minimizes confusion, since
>> there's no significant renaming, it gets us around the marketing issue
>> of still being pre-1.0, and it keeps us from putting too much importance
>> into version numbers.
>
>I guess this might work too.
>
>Cos



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Konstantin Boudnik
On Wed, Nov 16, 2011 at 09:15AM, Doug Cutting wrote:
> On 11/15/2011 06:06 PM, Konstantin Boudnik wrote:
> > Are you suggesting to drop 0.22 out of the picture all together? Any
> > reason for that? 
> 
> By no means.  I thought that we might, as Scott Carey said, treat 0.22
> as a minor release in the 1.x series.  I'd prefer that we consistently
> rename branches (0.20.x becomes 1.0.x, 0.21.x becomes 1.1.x, etc.).

Thanks for the explanation. I see your point in 1.?.x renames. My only concern
is that it might suggest that to the users that 1.2.0 (e.g. current 0.22) is a
sort of natural continuation from 1.0.0 (current 0.20.x) and the upgrade would
be easy and automatic. Which isn't necessary the case, IMO.

Separating them in two major versions won't be sending such a message.

> We're rapidly falling into the trap of putting too much significance in
> a version number, seeking some sort of marketing boost by declaring 1.0.
>  We can sidestep this by simply dropping the leading 0. and henceforth
> referring to things as 20, 21, 22, etc.  This minimizes confusion, since
> there's no significant renaming, it gets us around the marketing issue
> of still being pre-1.0, and it keeps us from putting too much importance
> into version numbers.

I guess this might work too.

Cos


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Doug Cutting
On 11/15/2011 06:06 PM, Konstantin Boudnik wrote:
> Are you suggesting to drop 0.22 out of the picture all together? Any
> reason for that? 

By no means.  I thought that we might, as Scott Carey said, treat 0.22
as a minor release in the 1.x series.  I'd prefer that we consistently
rename branches (0.20.x becomes 1.0.x, 0.21.x becomes 1.1.x, etc.).

We're rapidly falling into the trap of putting too much significance in
a version number, seeking some sort of marketing boost by declaring 1.0.
 We can sidestep this by simply dropping the leading 0. and henceforth
referring to things as 20, 21, 22, etc.  This minimizes confusion, since
there's no significant renaming, it gets us around the marketing issue
of still being pre-1.0, and it keeps us from putting too much importance
into version numbers.

Doug


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-16 Thread Konstantin Shvachko
A little wider perspective on where the renaming takes us and why it
is happening. My opinion.

Last year around this same time the Hadoop project was on the verge of
splitting.
We had three "commercial" versions of Hadoop competing to be the
"real" Hadoop, while the officially released Apache version was
outdated.
ASF did [amazingly] good job fencing off the claims for external
ownership of the Hadoop name, which effectively stopped the split the
way it was evolving. The danger of the External Project Split has
passed: now the others can call their stuff XYZ-DH7 and be done with
it.

This fall a danger of Internal Project Split has emerged, because
three versions were brewing independently. I call it a danger because
more versions of Hadoop means splitting and spreading resources of the
community including the  (rapidly growing) software stack above. It
also means stronger story for competing technologies. Which could be
good, or bad, or both.

The question is why does the project fall into Splitting danger every fall.
My answer is it's the "Forever-20" syndrome.
In the last several years there was always a "reason" to continue with
0.20. Mostly because businesses need to commit to a version for the
next year in fall. This is irrelevant to an open source project
development and contradicts its natural straight forward motion.

As many of you, last week I have been at Hadoop World and ApacheCon
and saw a lot (I mean thousands)
of people, enthusiastic about the technology, but majorly confused
about the versions.
My concern is that the rename of 0.20.205 to 1.0 means the community
will be stuck with it even longer,
leading to the "Occupy Hadoop" movement camping in the Apache Extras park.
I would have expected the RM of 0.23 advocating to call it 1.0, but it
didn't happen.
Renaming branches is not a big deal. The problem is that there is no a
consolidating version on the horizon.

I'll be glad to be wrong.
--Konstantin


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Konstantin Shvachko
Consistency of naming the releases is a very valid point and should be
the main concern in the decision making.

If 0.20.205 is called Hadoop 1, and 0.23 called Hadoop 2, then
releasing 0.22 under 0.22 will be confusing.
If we vote only on renaming 0.20.205 to 1.0 then the 0.23 release
becomes confusing, as well as the upcoming 0.22 release.

I think there are values in all three branches. I also think the three
have substantial differences so treating them as separate bases makes
sense to me. Presumably they will evolve more or less independently
for some time at least.

So I'd support the proposition (I think it was Doug's) to
1. Call the next release off 0.20.security branch as 1.0.0
2. Call the next release off 0.22 branch as 2.0.0
3. Call the next release off 0.23 branch as 3.0.0

We do not need to decide if 0.20.206 will be 1.1.0 or 1.0.1. It should
be decided when the subsequent release of 1.0.0 is voted in based on
the amount of changes introduced.

Since 0.23 has just been released a rename of 0.23 to 3.0.0 would work
for me as well.

Thanks,
--Konstantin

On Tue, Nov 15, 2011 at 7:35 PM, Joe Stein
 wrote:
> Consistency between supported branches and releases from trunk in some 
> logical order would be helpful for those outside of the community coming in, 
> labeled however works best for the active community.  My 0.235689 cents.
>
> /*
> Joe Stein
> http://www.medialets.com
> Twitter: @allthingshadoop
> */
>
> On Nov 15, 2011, at 9:47 PM, Konstantin Boudnik  wrote:
>
>> And once again - 0.22 seems to be forgotten for an unexplained reason.
>>
>> I urge to stick to original Arun's proposal and use 0.22 as 2.0
>> With the correction I like the following proposal.
>>
>> Cos
>>
>> On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote:
>>> I agree with some prior posters that renaming the 0.20-security sustaining
>>> branch could be confusing.
>>> How about the following (pseudo-code)?
>>>
>>> ## Just before we are ready to make rc0 for release 0.20.205.1, do:
>>> svn copy branch-0.20-security-205 branch-1.0
>>> ## and actually release it from branch-1.0 as release 1.0.0
>>>
>>> ## Then, after the 1.0.0 release vote ends successfully, do:
>>> svn copy branch-0.20-security branch-1.1
>>> ## This will pick up the remaining changes done to date, which would
>>> ## have gone into 0.20.206.0, and will instead go into release 1.1.0,
>>> ## sometime in the future
>>>
>>> ## However, since branch-0.23 was just recently split from trunk, it should
>>> be
>>> ## upgraded to 2.0 in the usual way, with a rename:
>>> svn mv branch-0.23 branch-2.0
>>> ## and also rename the actual release:
>>> svn mv tags/release-0.23.0 tags/release-2.0.0
>>> ## The work currently going into the future 0.23.1 will become 2.0.1, not
>>> 2.1.0.
>>> ## Work going into trunk will become 2.1 or higher in the future.
>>>
>>> This is a concrete, actionable proposal.  In an effort to establish
>>> consensus, would it be appropriate to call a vote on it?
>>> --Matt
>>>
>>>
>>> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:
>>>
 On 11/15/2011 05:49 PM, Eli Collins wrote:
> Are you suggesting a two part version scheme?  Ie
>
> 0.23.0 -> 2.0
> 0.23.1 -> 2.1

 I didn't specify.  We could either do that or:

 0.23.0 -> 2.0.0
 0.23.1 -> 2.0.1
   ...
 0.24.0 -> 2.1.0
   ...

 I don't care which much.  Do you?

> fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
> 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
> 22 either since it both has features that are in 20x, and 20x has
> features not in 22, and is not yet released or stable. Seems hard to
> come up with a reasonable version number for it.

 This is about the fourth or fifth different proposal around these.  I'm
 not sure things are congealing around a consensus.  I don't want to
 stand in the way of that, but I think we might first settle the part
 that we're nearer consensus on.

 Doug

>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Arun Murthy
Thanks Eli. In keeping with the theme of 'looking ahead' I was
thinking of upcoming 0.20.205.1 as 1.0.0. I'll clarify in the voting
thread too.

Sent from my iPhone

On Nov 15, 2011, at 10:13 PM, Eli Collins  wrote:

> On Tue, Nov 15, 2011 at 9:53 PM, Arun Murthy  wrote:
>> Eli,
>>
>> Seems to me that trying to 'carry over' numbers from 0.20.2xx would,
>> at best, lead to confusion... similar to folks asking for non-existent
>> 0.20.201/202.
>>
>> I propose we look forward with hadoop-1.0.0 as the supported release
>> with security+append to keep things simple.
>>
>> Thoughts?
>>
>
> Are you proposing 205.1 be called 1.0.1 and 206 will be 1.1.0?  Or
> that we rename 0.205.0 to be 1.0.0 and 205.1 will be 1.0.1 and 206.0
> will be 1.1.0?
>
> That seems just as confusing IMO but I don't feel strongly either way.
>
> Thanks,
> Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Scott Carey

On 11/15/11 6:47 PM, "Konstantin Boudnik"  wrote:

>And once again - 0.22 seems to be forgotten for an unexplained reason.
>
>I urge to stick to original Arun's proposal and use 0.22 as 2.0
>With the correction I like the following proposal.

If 0.20.20x ends up in the 1.0.x line, then 0.22.x should end up in the
1.1.x line, IMO.

0.22 is not a radical incompatible overhaul from 0.20.20x.  So IMO it
should not change the major version number, but only the minor one.

However, 0.23 IS a major change, and could justify a 2.0.x.

This all assumes the version numbers are going to start meaning something
along the lines of

major.minor.patch

where 
- changes to major denote big, incompatible changes,
- changes to minor denote large changes/additions/improvements, but
backwards compatible
- changes to patch denote bugfixes or minor additions/improvements with no
compatibility impact.

-Scott



>
>Cos
>
>On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote:
>> I agree with some prior posters that renaming the 0.20-security
>>sustaining
>> branch could be confusing.
>> How about the following (pseudo-code)?
>> 
>> ## Just before we are ready to make rc0 for release 0.20.205.1, do:
>> svn copy branch-0.20-security-205 branch-1.0
>> ## and actually release it from branch-1.0 as release 1.0.0
>> 
>> ## Then, after the 1.0.0 release vote ends successfully, do:
>> svn copy branch-0.20-security branch-1.1
>> ## This will pick up the remaining changes done to date, which would
>> ## have gone into 0.20.206.0, and will instead go into release 1.1.0,
>> ## sometime in the future
>> 
>> ## However, since branch-0.23 was just recently split from trunk, it
>>should
>> be
>> ## upgraded to 2.0 in the usual way, with a rename:
>> svn mv branch-0.23 branch-2.0
>> ## and also rename the actual release:
>> svn mv tags/release-0.23.0 tags/release-2.0.0
>> ## The work currently going into the future 0.23.1 will become 2.0.1,
>>not
>> 2.1.0.
>> ## Work going into trunk will become 2.1 or higher in the future.
>> 
>> This is a concrete, actionable proposal.  In an effort to establish
>> consensus, would it be appropriate to call a vote on it?
>> --Matt
>> 
>> 
>> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting 
>>wrote:
>> 
>> > On 11/15/2011 05:49 PM, Eli Collins wrote:
>> > > Are you suggesting a two part version scheme?  Ie
>> > >
>> > > 0.23.0 -> 2.0
>> > > 0.23.1 -> 2.1
>> >
>> > I didn't specify.  We could either do that or:
>> >
>> >  0.23.0 -> 2.0.0
>> >  0.23.1 -> 2.0.1
>> >...
>> >  0.24.0 -> 2.1.0
>> >...
>> >
>> > I don't care which much.  Do you?
>> >
>> > > fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
>> > > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't
>>rename
>> > > 22 either since it both has features that are in 20x, and 20x has
>> > > features not in 22, and is not yet released or stable. Seems hard to
>> > > come up with a reasonable version number for it.
>> >
>> > This is about the fourth or fifth different proposal around these.
>>I'm
>> > not sure things are congealing around a consensus.  I don't want to
>> > stand in the way of that, but I think we might first settle the part
>> > that we're nearer consensus on.
>> >
>> > Doug
>> >



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Eli Collins
On Tue, Nov 15, 2011 at 9:53 PM, Arun Murthy  wrote:
> Eli,
>
> Seems to me that trying to 'carry over' numbers from 0.20.2xx would,
> at best, lead to confusion... similar to folks asking for non-existent
> 0.20.201/202.
>
> I propose we look forward with hadoop-1.0.0 as the supported release
> with security+append to keep things simple.
>
> Thoughts?
>

Are you proposing 205.1 be called 1.0.1 and 206 will be 1.1.0?  Or
that we rename 0.205.0 to be 1.0.0 and 205.1 will be 1.0.1 and 206.0
will be 1.1.0?

That seems just as confusing IMO but I don't feel strongly either way.

Thanks,
Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Arun Murthy
Eli,

Seems to me that trying to 'carry over' numbers from 0.20.2xx would,
at best, lead to confusion... similar to folks asking for non-existent
0.20.201/202.

I propose we look forward with hadoop-1.0.0 as the supported release
with security+append to keep things simple.

Thoughts?

thanks,
Arun

Sent from my iPhone

On Nov 15, 2011, at 8:52 PM, Eli Collins  wrote:

> On Tue, Nov 15, 2011 at 8:14 PM, Arun Murthy  wrote:
>> I think this discussion is getting too wide, can we tease them apart?
>>
>> Do we agree we should call the forthcoming releases off
>> branch-0.20-security as 1.x.x?
>>
>> Let me start a vote for just that.
>
> +1
>
> IMO the values of x.x should match the current dot versions eg
> 0.20.206.0 would be 1.6.0.
>
> Thanks,
> Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Joe Stein
If trunk releases would then mean 2.x.x then the branch 1x.x ( 0.20.06.0
being 1.6.0) makes total sense

+1 (not binding)

so the current trunk release = 2.0.0 and the branch release 0.20.206.0 =
1.6.0

speaking from those of us that have < 4,000 nodes in our cluster and want
to proliferate the technology

- one love, Hadoop

not sure though about Todd's comment on HBase & Hive and how sister/brother
projects have to deal with it.  This should be important to not orphan them
more than maybe already has been done (cough cough, append).

On Tue, Nov 15, 2011 at 11:51 PM, Eli Collins  wrote:

> On Tue, Nov 15, 2011 at 8:14 PM, Arun Murthy  wrote:
> > I think this discussion is getting too wide, can we tease them apart?
> >
> > Do we agree we should call the forthcoming releases off
> > branch-0.20-security as 1.x.x?
> >
> > Let me start a vote for just that.
>
> +1
>
> IMO the values of x.x should match the current dot versions eg
> 0.20.206.0 would be 1.6.0.
>
> Thanks,
> Eli
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Eli Collins
On Tue, Nov 15, 2011 at 8:14 PM, Arun Murthy  wrote:
> I think this discussion is getting too wide, can we tease them apart?
>
> Do we agree we should call the forthcoming releases off
> branch-0.20-security as 1.x.x?
>
> Let me start a vote for just that.

+1

IMO the values of x.x should match the current dot versions eg
0.20.206.0 would be 1.6.0.

Thanks,
Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Arun Murthy
On Nov 15, 2011, at 6:03 PM, Eli Collins  wrote:

> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:
>> On 11/15/2011 05:49 PM, Eli Collins wrote:
>>> Are you suggesting a two part version scheme?  Ie
>>>
>>> 0.23.0 -> 2.0
>>> 0.23.1 -> 2.1
>>
>> I didn't specify.  We could either do that or:
>>
>>  0.23.0 -> 2.0.0
>>  0.23.1 -> 2.0.1
>>...
>>  0.24.0 -> 2.1.0
>>...
>>
>> I don't care which much.  Do you?
>>
>
> Nope. Sticking with the three part scheme seems reasonable since we'll
> eventually do sustaining releases of 23.
>
> +1 to your scheme above.

+1 for three part scheme.

Arun


>
> Thanks,
> Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Arun Murthy
I think this discussion is getting too wide, can we tease them apart?

Do we agree we should call the forthcoming releases off
branch-0.20-security as 1.x.x?

Let me start a vote for just that.

Arun

Sent from my iPhone

On Nov 15, 2011, at 6:43 PM, Matt Foley  wrote:

> I agree with some prior posters that renaming the 0.20-security sustaining
> branch could be confusing.
> How about the following (pseudo-code)?
>
> ## Just before we are ready to make rc0 for release 0.20.205.1, do:
> svn copy branch-0.20-security-205 branch-1.0
> ## and actually release it from branch-1.0 as release 1.0.0
>
> ## Then, after the 1.0.0 release vote ends successfully, do:
> svn copy branch-0.20-security branch-1.1
> ## This will pick up the remaining changes done to date, which would
> ## have gone into 0.20.206.0, and will instead go into release 1.1.0,
> ## sometime in the future
>
> ## However, since branch-0.23 was just recently split from trunk, it should
> be
> ## upgraded to 2.0 in the usual way, with a rename:
> svn mv branch-0.23 branch-2.0
> ## and also rename the actual release:
> svn mv tags/release-0.23.0 tags/release-2.0.0
> ## The work currently going into the future 0.23.1 will become 2.0.1, not
> 2.1.0.
> ## Work going into trunk will become 2.1 or higher in the future.
>
> This is a concrete, actionable proposal.  In an effort to establish
> consensus, would it be appropriate to call a vote on it?
> --Matt
>
>
> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:
>
>> On 11/15/2011 05:49 PM, Eli Collins wrote:
>>> Are you suggesting a two part version scheme?  Ie
>>>
>>> 0.23.0 -> 2.0
>>> 0.23.1 -> 2.1
>>
>> I didn't specify.  We could either do that or:
>>
>> 0.23.0 -> 2.0.0
>> 0.23.1 -> 2.0.1
>>   ...
>> 0.24.0 -> 2.1.0
>>   ...
>>
>> I don't care which much.  Do you?
>>
>>> fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
>>> 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
>>> 22 either since it both has features that are in 20x, and 20x has
>>> features not in 22, and is not yet released or stable. Seems hard to
>>> come up with a reasonable version number for it.
>>
>> This is about the fourth or fifth different proposal around these.  I'm
>> not sure things are congealing around a consensus.  I don't want to
>> stand in the way of that, but I think we might first settle the part
>> that we're nearer consensus on.
>>
>> Doug
>>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Joe Stein
Consistency between supported branches and releases from trunk in some logical 
order would be helpful for those outside of the community coming in, labeled 
however works best for the active community.  My 0.235689 cents.

/*
Joe Stein
http://www.medialets.com
Twitter: @allthingshadoop
*/

On Nov 15, 2011, at 9:47 PM, Konstantin Boudnik  wrote:

> And once again - 0.22 seems to be forgotten for an unexplained reason.
> 
> I urge to stick to original Arun's proposal and use 0.22 as 2.0
> With the correction I like the following proposal.
> 
> Cos
> 
> On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote:
>> I agree with some prior posters that renaming the 0.20-security sustaining
>> branch could be confusing.
>> How about the following (pseudo-code)?
>> 
>> ## Just before we are ready to make rc0 for release 0.20.205.1, do:
>> svn copy branch-0.20-security-205 branch-1.0
>> ## and actually release it from branch-1.0 as release 1.0.0
>> 
>> ## Then, after the 1.0.0 release vote ends successfully, do:
>> svn copy branch-0.20-security branch-1.1
>> ## This will pick up the remaining changes done to date, which would
>> ## have gone into 0.20.206.0, and will instead go into release 1.1.0,
>> ## sometime in the future
>> 
>> ## However, since branch-0.23 was just recently split from trunk, it should
>> be
>> ## upgraded to 2.0 in the usual way, with a rename:
>> svn mv branch-0.23 branch-2.0
>> ## and also rename the actual release:
>> svn mv tags/release-0.23.0 tags/release-2.0.0
>> ## The work currently going into the future 0.23.1 will become 2.0.1, not
>> 2.1.0.
>> ## Work going into trunk will become 2.1 or higher in the future.
>> 
>> This is a concrete, actionable proposal.  In an effort to establish
>> consensus, would it be appropriate to call a vote on it?
>> --Matt
>> 
>> 
>> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:
>> 
>>> On 11/15/2011 05:49 PM, Eli Collins wrote:
 Are you suggesting a two part version scheme?  Ie
 
 0.23.0 -> 2.0
 0.23.1 -> 2.1
>>> 
>>> I didn't specify.  We could either do that or:
>>> 
>>> 0.23.0 -> 2.0.0
>>> 0.23.1 -> 2.0.1
>>>   ...
>>> 0.24.0 -> 2.1.0
>>>   ...
>>> 
>>> I don't care which much.  Do you?
>>> 
 fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
 22 either since it both has features that are in 20x, and 20x has
 features not in 22, and is not yet released or stable. Seems hard to
 come up with a reasonable version number for it.
>>> 
>>> This is about the fourth or fifth different proposal around these.  I'm
>>> not sure things are congealing around a consensus.  I don't want to
>>> stand in the way of that, but I think we might first settle the part
>>> that we're nearer consensus on.
>>> 
>>> Doug
>>> 


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Konstantin Boudnik
And once again - 0.22 seems to be forgotten for an unexplained reason.

I urge to stick to original Arun's proposal and use 0.22 as 2.0
With the correction I like the following proposal.

Cos

On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote:
> I agree with some prior posters that renaming the 0.20-security sustaining
> branch could be confusing.
> How about the following (pseudo-code)?
> 
> ## Just before we are ready to make rc0 for release 0.20.205.1, do:
> svn copy branch-0.20-security-205 branch-1.0
> ## and actually release it from branch-1.0 as release 1.0.0
> 
> ## Then, after the 1.0.0 release vote ends successfully, do:
> svn copy branch-0.20-security branch-1.1
> ## This will pick up the remaining changes done to date, which would
> ## have gone into 0.20.206.0, and will instead go into release 1.1.0,
> ## sometime in the future
> 
> ## However, since branch-0.23 was just recently split from trunk, it should
> be
> ## upgraded to 2.0 in the usual way, with a rename:
> svn mv branch-0.23 branch-2.0
> ## and also rename the actual release:
> svn mv tags/release-0.23.0 tags/release-2.0.0
> ## The work currently going into the future 0.23.1 will become 2.0.1, not
> 2.1.0.
> ## Work going into trunk will become 2.1 or higher in the future.
> 
> This is a concrete, actionable proposal.  In an effort to establish
> consensus, would it be appropriate to call a vote on it?
> --Matt
> 
> 
> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:
> 
> > On 11/15/2011 05:49 PM, Eli Collins wrote:
> > > Are you suggesting a two part version scheme?  Ie
> > >
> > > 0.23.0 -> 2.0
> > > 0.23.1 -> 2.1
> >
> > I didn't specify.  We could either do that or:
> >
> >  0.23.0 -> 2.0.0
> >  0.23.1 -> 2.0.1
> >...
> >  0.24.0 -> 2.1.0
> >...
> >
> > I don't care which much.  Do you?
> >
> > > fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
> > > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
> > > 22 either since it both has features that are in 20x, and 20x has
> > > features not in 22, and is not yet released or stable. Seems hard to
> > > come up with a reasonable version number for it.
> >
> > This is about the fourth or fifth different proposal around these.  I'm
> > not sure things are congealing around a consensus.  I don't want to
> > stand in the way of that, but I think we might first settle the part
> > that we're nearer consensus on.
> >
> > Doug
> >


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Matt Foley
I agree with some prior posters that renaming the 0.20-security sustaining
branch could be confusing.
How about the following (pseudo-code)?

## Just before we are ready to make rc0 for release 0.20.205.1, do:
svn copy branch-0.20-security-205 branch-1.0
## and actually release it from branch-1.0 as release 1.0.0

## Then, after the 1.0.0 release vote ends successfully, do:
svn copy branch-0.20-security branch-1.1
## This will pick up the remaining changes done to date, which would
## have gone into 0.20.206.0, and will instead go into release 1.1.0,
## sometime in the future

## However, since branch-0.23 was just recently split from trunk, it should
be
## upgraded to 2.0 in the usual way, with a rename:
svn mv branch-0.23 branch-2.0
## and also rename the actual release:
svn mv tags/release-0.23.0 tags/release-2.0.0
## The work currently going into the future 0.23.1 will become 2.0.1, not
2.1.0.
## Work going into trunk will become 2.1 or higher in the future.

This is a concrete, actionable proposal.  In an effort to establish
consensus, would it be appropriate to call a vote on it?
--Matt


On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:

> On 11/15/2011 05:49 PM, Eli Collins wrote:
> > Are you suggesting a two part version scheme?  Ie
> >
> > 0.23.0 -> 2.0
> > 0.23.1 -> 2.1
>
> I didn't specify.  We could either do that or:
>
>  0.23.0 -> 2.0.0
>  0.23.1 -> 2.0.1
>...
>  0.24.0 -> 2.1.0
>...
>
> I don't care which much.  Do you?
>
> > fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
> > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
> > 22 either since it both has features that are in 20x, and 20x has
> > features not in 22, and is not yet released or stable. Seems hard to
> > come up with a reasonable version number for it.
>
> This is about the fourth or fifth different proposal around these.  I'm
> not sure things are congealing around a consensus.  I don't want to
> stand in the way of that, but I think we might first settle the part
> that we're nearer consensus on.
>
> Doug
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Konstantin Boudnik
I believe it has been advocated a number of times in that thread to release
0.22 as 2.0.

Are you suggesting to drop 0.22 out of the picture all together? Any
reason for that? 

Thanks,
  Cos

On Tue, Nov 15, 2011 at 05:37PM, Doug Cutting wrote:
> On 11/15/2011 01:43 PM, Todd Lipcon wrote:
> > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
> 
> Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0.
>  There are a number of different views about what to do with 0.20, 0.21
> and 0.22.  So maybe we should proceed where there's consensus and not
> argue extensively where there's disagreement?
> 
> Since 0.23 has little install base yet it should be easy to rename.  If
> we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems
> inappropriate.
> 
> Can we agree to 0.23 -> 2.0?  That's consistent with the MR2 nomenclature.
> 
> Doug


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Eli Collins
On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting  wrote:
> On 11/15/2011 05:49 PM, Eli Collins wrote:
>> Are you suggesting a two part version scheme?  Ie
>>
>> 0.23.0 -> 2.0
>> 0.23.1 -> 2.1
>
> I didn't specify.  We could either do that or:
>
>  0.23.0 -> 2.0.0
>  0.23.1 -> 2.0.1
>    ...
>  0.24.0 -> 2.1.0
>    ...
>
> I don't care which much.  Do you?
>

Nope. Sticking with the three part scheme seems reasonable since we'll
eventually do sustaining releases of 23.

+1 to your scheme above.

Thanks,
Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Doug Cutting
On 11/15/2011 05:49 PM, Eli Collins wrote:
> Are you suggesting a two part version scheme?  Ie
> 
> 0.23.0 -> 2.0
> 0.23.1 -> 2.1

I didn't specify.  We could either do that or:

  0.23.0 -> 2.0.0
  0.23.1 -> 2.0.1
...
  0.24.0 -> 2.1.0
...

I don't care which much.  Do you?

> fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
> 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
> 22 either since it both has features that are in 20x, and 20x has
> features not in 22, and is not yet released or stable. Seems hard to
> come up with a reasonable version number for it.

This is about the fourth or fifth different proposal around these.  I'm
not sure things are congealing around a consensus.  I don't want to
stand in the way of that, but I think we might first settle the part
that we're nearer consensus on.

Doug


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Eli Collins
On Tue, Nov 15, 2011 at 5:37 PM, Doug Cutting  wrote:
> On 11/15/2011 01:43 PM, Todd Lipcon wrote:
>> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
>
> Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0.
>  There are a number of different views about what to do with 0.20, 0.21
> and 0.22.  So maybe we should proceed where there's consensus and not
> argue extensively where there's disagreement?
>
> Since 0.23 has little install base yet it should be easy to rename.  If
> we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems
> inappropriate.
>
> Can we agree to 0.23 -> 2.0?  That's consistent with the MR2 nomenclature.
>

Are you suggesting a two part version scheme?  Ie

0.23.0 -> 2.0
0.23.1 -> 2.1

I'm +1 to that.

fwiw I'd map 0.20.200.0 to 1.0,  203.0 would be 1.3, 205.0, would be
1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename
22 either since it both has features that are in 20x, and 20x has
features not in 22, and is not yet released or stable. Seems hard to
come up with a reasonable version number for it.

Thanks,
Eli


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Ahmed Radwan
+1

> Can we agree to 0.23 -> 2.0?  That's consistent with the MR2 nomenclature.

Best Regards
Ahmed

On Tue, Nov 15, 2011 at 5:37 PM, Doug Cutting  wrote:
> On 11/15/2011 01:43 PM, Todd Lipcon wrote:
>> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
>
> Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0.
>  There are a number of different views about what to do with 0.20, 0.21
> and 0.22.  So maybe we should proceed where there's consensus and not
> argue extensively where there's disagreement?
>
> Since 0.23 has little install base yet it should be easy to rename.  If
> we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems
> inappropriate.
>
> Can we agree to 0.23 -> 2.0?  That's consistent with the MR2 nomenclature.
>
> Doug
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Doug Cutting
On 11/15/2011 01:43 PM, Todd Lipcon wrote:
> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.

Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0.
 There are a number of different views about what to do with 0.20, 0.21
and 0.22.  So maybe we should proceed where there's consensus and not
argue extensively where there's disagreement?

Since 0.23 has little install base yet it should be easy to rename.  If
we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems
inappropriate.

Can we agree to 0.23 -> 2.0?  That's consistent with the MR2 nomenclature.

Doug


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Luke Lu
+1 on *new* releases from 0.20.2xx branches as 1.x; 0.22 branch as 2.x
and 0.23/24 branches as 3.x.

On Tue, Nov 15, 2011 at 2:32 PM, Arun C Murthy  wrote:
> I don't see this as 'renaming', I propose we just look forward and make the 
> next release from branch-0.20-security as 1.0 to keep things simple.
>
> IMHO, going back to rename existing releases (0.21 etc.) isn't productive.
>
> Arun
>
> On Nov 15, 2011, at 1:43 PM, Todd Lipcon wrote:
>
>> On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran  wrote:
>>> On 15/11/11 06:07, Dhruba Borthakur wrote:

 +1 to making the upcoming 0.23 release as 2.0.

>>>
>>> +1
>>>
>>> And leave the 0.20.20x chain as is, just because people are used to it
>>>
>>
>> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
>> Though it's weird to never have a 1.0, the "0.20" name is well
>> ingrained, and I think renaming it at this point will cause a lot of
>> confusion (plus cause problems for downstream projects like Hive and
>> HBase which use regexes against the version string in various shim
>> layers)
>>
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Arun C Murthy
I don't see this as 'renaming', I propose we just look forward and make the 
next release from branch-0.20-security as 1.0 to keep things simple.

IMHO, going back to rename existing releases (0.21 etc.) isn't productive.

Arun

On Nov 15, 2011, at 1:43 PM, Todd Lipcon wrote:

> On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran  wrote:
>> On 15/11/11 06:07, Dhruba Borthakur wrote:
>>> 
>>> +1 to making the upcoming 0.23 release as 2.0.
>>> 
>> 
>> +1
>> 
>> And leave the 0.20.20x chain as is, just because people are used to it
>> 
> 
> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
> Though it's weird to never have a 1.0, the "0.20" name is well
> ingrained, and I think renaming it at this point will cause a lot of
> confusion (plus cause problems for downstream projects like Hive and
> HBase which use regexes against the version string in various shim
> layers)
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Ted Dunning
On Tue, Nov 15, 2011 at 2:17 PM, Owen O'Malley  wrote:

> On Tue, Nov 15, 2011 at 1:43 PM, Todd Lipcon  wrote:
>
> > On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran 
> wrote:
> > > On 15/11/11 06:07, Dhruba Borthakur wrote:
> > >>
> > >> +1 to making the upcoming 0.23 release as 2.0.
> > >>
> > >
> > > +1
> > >
> > > And leave the 0.20.20x chain as is, just because people are used to it
> > >
> >
> > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
> >
>
> I really don't see it that way. I'm continuing (up to and including last
> week) to have to explain the version numbering for 0.20, 0.20.2xx, 0.21,
> 0.22, and 0.23. Obviously the people who are willing to do the work don't
> feel that it is a waste of time or they wouldn't be signing up to do the
> work.


This smells like Java 1.4 versus Java 6 all over again.

Explaining why 0.20 became 1.0 when 0.21 didn't become anything is a pretty
strange exercise.

If a marketing person somewhere is totally set on making 0.23 be 2.0, just
do that and be done.  There doesn't have to be a 1.0 version.


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Owen O'Malley
On Tue, Nov 15, 2011 at 1:43 PM, Todd Lipcon  wrote:

> On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran  wrote:
> > On 15/11/11 06:07, Dhruba Borthakur wrote:
> >>
> >> +1 to making the upcoming 0.23 release as 2.0.
> >>
> >
> > +1
> >
> > And leave the 0.20.20x chain as is, just because people are used to it
> >
>
> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
>

I really don't see it that way. I'm continuing (up to and including last
week) to have to explain the version numbering for 0.20, 0.20.2xx, 0.21,
0.22, and 0.23. Obviously the people who are willing to do the work don't
feel that it is a waste of time or they wouldn't be signing up to do the
work.

-- Owen


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Todd Lipcon
On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran  wrote:
> On 15/11/11 06:07, Dhruba Borthakur wrote:
>>
>> +1 to making the upcoming 0.23 release as 2.0.
>>
>
> +1
>
> And leave the 0.20.20x chain as is, just because people are used to it
>

+1 to Steve's proposal. Renaming 0.20 is too big a pain at this point.
Though it's weird to never have a 1.0, the "0.20" name is well
ingrained, and I think renaming it at this point will cause a lot of
confusion (plus cause problems for downstream projects like Hive and
HBase which use regexes against the version string in various shim
layers)

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-15 Thread Steve Loughran

On 15/11/11 06:07, Dhruba Borthakur wrote:

+1 to making the upcoming 0.23 release as 2.0.



+1

And leave the 0.20.20x chain as is, just because people are used to it



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Mahadev Konar
+1 for 0.20.2xx as 1.0.

mahadev

On Mon, Nov 14, 2011 at 9:37 PM, Sharad Agarwal  wrote:
> +1
>
> remembering and understanding current release numbering and attributing it
> to stable/compatible etc. is really painful.
>
>
> On Tue, Nov 15, 2011 at 3:41 AM, Arun C Murthy  wrote:
>
>> Folks,
>>
>> Apache Hadoop has come a long way since our humble beginnings. As a
>> community we've made significant progress, even in 2011 - we've had 3
>> releases off the branch-0.20-security (0.20.205 being the latest) and we
>> just released 0.23.0 last week, our first major release off trunk in a
>> while.
>>
>> With hadoop-0.20.205 we finally have an Apache release with both security
>> and HBase support, both critical for the growing ecosystem.
>>
>> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has
>> something we've wanted for a while and I think it's time for us to just
>> ship it. Linus did something similar with GNU/Linux 3.0.
>>
>> Yes, we could add more features or better it along many dimensions (ala
>> hadoop-0.23), but right now we have a pretty decent piece of software i.e.
>>  the feature set in hadoop-0.20.205 is compelling and widely used. We could
>> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can
>> support compatibility in the hadoop-1.x series, which is the essential
>> ingredient. This isn't a brand new idea, Doug suggested this a long while
>> ago.
>>
>> Thoughts?
>>
>> thanks,
>> Arun
>>
>>
>>
>>
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Owen O'Malley
On Mon, Nov 14, 2011 at 9:56 PM, Andreas Neumann  wrote:

> +1 for not renaming past releases, that would really start confusion.
>
> If .20.20x.y corresponds to 1.z.y, then z=x-5 and:
>
> 0.20.205.1 -> 1.0.1
> 0.20.206.0 -> 1.1.0
>

Eli had said previously (and I agree) that 205.1 seems overly aggressive
for a point release and would be better named as a minor release. But
0.20.205.1 going to either 1.0.0 or 1.0.1 is much much better than the
current state.

-- Owen


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Dhruba Borthakur
+1 to making the upcoming 0.23 release as 2.0.

-dhruba


On Mon, Nov 14, 2011 at 9:47 PM, Owen O'Malley  wrote:

> I think this is great. Thanks, Arun.
>
> Since the 2xx line is clearly a major branch, we should designate it as
> 1.0. I don't think there is any need to rename current releases, so let's
> just rename the upcoming ones:
>
> 0.20.205.1 -> 1.0.0
> 0.20.206.0 -> 1.1.0
>
> 0.21 is dead and we should just leave it as it as 0.21.
>
> If we want to leave space for a 0.22 release, it should be 2.0.0:
>
> 0.22.0 -> 2.0.0
>
> And that would make the 0.23.x releases 3.x.y.
>
> 0.23.0 -> 3.0.0
>
> -- Owen
>



-- 
Subscribe to my posts at http://www.facebook.com/dhruba


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Andreas Neumann
+1 for not renaming past releases, that would really start confusion.

If .20.20x.y corresponds to 1.z.y, then z=x-5 and:

0.20.205.1 -> 1.0.1
0.20.206.0 -> 1.1.0


-Andreas.

On 11/14/11 9:47 PM, "Owen O'Malley"  wrote:

>I think this is great. Thanks, Arun.
>
>Since the 2xx line is clearly a major branch, we should designate it as
>1.0. I don't think there is any need to rename current releases, so let's
>just rename the upcoming ones:
>
>0.20.205.1 -> 1.0.0
>0.20.206.0 -> 1.1.0
>
>0.21 is dead and we should just leave it as it as 0.21.
>
>If we want to leave space for a 0.22 release, it should be 2.0.0:
>
>0.22.0 -> 2.0.0
>
>And that would make the 0.23.x releases 3.x.y.
>
>0.23.0 -> 3.0.0
>
>-- Owen




Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Owen O'Malley
I think this is great. Thanks, Arun.

Since the 2xx line is clearly a major branch, we should designate it as
1.0. I don't think there is any need to rename current releases, so let's
just rename the upcoming ones:

0.20.205.1 -> 1.0.0
0.20.206.0 -> 1.1.0

0.21 is dead and we should just leave it as it as 0.21.

If we want to leave space for a 0.22 release, it should be 2.0.0:

0.22.0 -> 2.0.0

And that would make the 0.23.x releases 3.x.y.

0.23.0 -> 3.0.0

-- Owen


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Sharad Agarwal
+1

remembering and understanding current release numbering and attributing it
to stable/compatible etc. is really painful.


On Tue, Nov 15, 2011 at 3:41 AM, Arun C Murthy  wrote:

> Folks,
>
> Apache Hadoop has come a long way since our humble beginnings. As a
> community we've made significant progress, even in 2011 - we've had 3
> releases off the branch-0.20-security (0.20.205 being the latest) and we
> just released 0.23.0 last week, our first major release off trunk in a
> while.
>
> With hadoop-0.20.205 we finally have an Apache release with both security
> and HBase support, both critical for the growing ecosystem.
>
> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has
> something we've wanted for a while and I think it's time for us to just
> ship it. Linus did something similar with GNU/Linux 3.0.
>
> Yes, we could add more features or better it along many dimensions (ala
> hadoop-0.23), but right now we have a pretty decent piece of software i.e.
>  the feature set in hadoop-0.20.205 is compelling and widely used. We could
> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can
> support compatibility in the hadoop-1.x series, which is the essential
> ingredient. This isn't a brand new idea, Doug suggested this a long while
> ago.
>
> Thoughts?
>
> thanks,
> Arun
>
>
>
>


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Konstantin Boudnik
+1 on graduating .205 as 1.0. It is a very mature and widely used
version of Hadoop and really has a significant bang for a buck!

It seems that making 0.22 to be 2.0 has a lot of sense because its coming
release carries a number of significant changes qualifying it to be a major
release. .23 seems to be a good candidate for 3.0 for exactly the same reasons
with MR2 framework and all.

Seems like a great time for the move!
  Cos

On Mon, Nov 14, 2011 at 02:11PM, Arun C Murthy wrote:
> Folks,
> 
> Apache Hadoop has come a long way since our humble beginnings. As a
> community we've made significant progress, even in 2011 - we've had 3
> releases off the branch-0.20-security (0.20.205 being the latest) and we
> just released 0.23.0 last week, our first major release off trunk in a
> while.
> 
> With hadoop-0.20.205 we finally have an Apache release with both security
> and HBase support, both critical for the growing ecosystem.
> 
> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has
> something we've wanted for a while and I think it's time for us to just ship
> it. Linus did something similar with GNU/Linux 3.0. 
> 
> Yes, we could add more features or better it along many dimensions (ala
> hadoop-0.23), but right now we have a pretty decent piece of software i.e.
> the feature set in hadoop-0.20.205 is compelling and widely used. We could
> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can
> support compatibility in the hadoop-1.x series, which is the essential
> ingredient. This isn't a brand new idea, Doug suggested this a long while
> ago. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> 
> 


Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Todd Papaioannou
A) is MUCH better from a product branding stand point.. which is what this is 
mostly about. I would go for something along those lines.

ToddP

On Nov 14, 2011, at 2:41 PM, Doug Cutting wrote:

> To be specific, I think one of the possible could be sensible:
> 
> A. Rename as follows:
> 
>  0.20 -> 1.0
>  0.21 -> 1.1
>  0.22 -> 1.2
>  0.23 -> 2.0
>  0.24 -> 2.1
> 
> B. Just drop the leading zero, e.g., 0.23.0 becomes 23.0.
> 
> Doug
> 
> On 11/14/2011 02:11 PM, Arun C Murthy wrote:
>> Folks,
>> 
>> Apache Hadoop has come a long way since our humble beginnings. As a 
>> community we've made significant progress, even in 2011 - we've had 3 
>> releases off the branch-0.20-security (0.20.205 being the latest) and we 
>> just released 0.23.0 last week, our first major release off trunk in a while.
>> 
>> With hadoop-0.20.205 we finally have an Apache release with both security 
>> and HBase support, both critical for the growing ecosystem.
>> 
>> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has 
>> something we've wanted for a while and I think it's time for us to just ship 
>> it. Linus did something similar with GNU/Linux 3.0. 
>> 
>> Yes, we could add more features or better it along many dimensions (ala 
>> hadoop-0.23), but right now we have a pretty decent piece of software i.e.  
>> the feature set in hadoop-0.20.205 is compelling and widely used. We could 
>> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can 
>> support compatibility in the hadoop-1.x series, which is the essential 
>> ingredient. This isn't a brand new idea, Doug suggested this a long while 
>> ago. 
>> 
>> Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> 
>> 



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Mattmann, Chris A (388J)
On Nov 14, 2011, at 2:41 PM, Doug Cutting wrote:

> To be specific, I think one of the possible could be sensible:
> 
> A. Rename as follows:
> 
>  0.20 -> 1.0
>  0.21 -> 1.1
>  0.22 -> 1.2
>  0.23 -> 2.0
>  0.24 -> 2.1

I like this one, Doug. +1.

Cheers,
Chris

> 
> B. Just drop the leading zero, e.g., 0.23.0 becomes 23.0.
> 
> Doug
> 
> On 11/14/2011 02:11 PM, Arun C Murthy wrote:
>> Folks,
>> 
>> Apache Hadoop has come a long way since our humble beginnings. As a 
>> community we've made significant progress, even in 2011 - we've had 3 
>> releases off the branch-0.20-security (0.20.205 being the latest) and we 
>> just released 0.23.0 last week, our first major release off trunk in a while.
>> 
>> With hadoop-0.20.205 we finally have an Apache release with both security 
>> and HBase support, both critical for the growing ecosystem.
>> 
>> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has 
>> something we've wanted for a while and I think it's time for us to just ship 
>> it. Linus did something similar with GNU/Linux 3.0. 
>> 
>> Yes, we could add more features or better it along many dimensions (ala 
>> hadoop-0.23), but right now we have a pretty decent piece of software i.e.  
>> the feature set in hadoop-0.20.205 is compelling and widely used. We could 
>> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can 
>> support compatibility in the hadoop-1.x series, which is the essential 
>> ingredient. This isn't a brand new idea, Doug suggested this a long while 
>> ago. 
>> 
>> Thoughts?
>> 
>> thanks,
>> Arun
>> 
>> 
>> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Mattmann, Chris A (388J)
Hey Guys,

My super +1 for calling one 0.20.205 as 1.0.

1 point oh!

Cheers,
Chris

On Nov 14, 2011, at 2:11 PM, Arun C Murthy wrote:

> Folks,
> 
> Apache Hadoop has come a long way since our humble beginnings. As a community 
> we've made significant progress, even in 2011 - we've had 3 releases off the 
> branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 
> last week, our first major release off trunk in a while.
> 
> With hadoop-0.20.205 we finally have an Apache release with both security and 
> HBase support, both critical for the growing ecosystem.
> 
> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has 
> something we've wanted for a while and I think it's time for us to just ship 
> it. Linus did something similar with GNU/Linux 3.0. 
> 
> Yes, we could add more features or better it along many dimensions (ala 
> hadoop-0.23), but right now we have a pretty decent piece of software i.e.  
> the feature set in hadoop-0.20.205 is compelling and widely used. We could 
> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can 
> support compatibility in the hadoop-1.x series, which is the essential 
> ingredient. This isn't a brand new idea, Doug suggested this a long while 
> ago. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Milind.Bhandarkar
Arun,

You beat me to start this discussion :-)

I was at Apachecon recently, and based on the questions and comments from
several attendees for the hadoop sessions, as well as the hadoop meetup
afterwards, it was clear that users are perplexed about our versioning
strategies.

In addition, Doug and Owen also have publicly stated (in #hw2011 and
#apachecon11 respectively) that 0.20.2xx should be considered a 1.0.

There is a perception (no doubt caused by 0.19 and 0.21 *abandonment*)
that releases ending in odd numbers are unstable releases. So, some users
were confused when some speakers urged folks to try out 0.23.

I second your proposal that 0.20.2xx should be called 1.x.

Based on some encouraging results reported on 0.22, I propose that it
should be called 2.0.

Which makes 0.23 as the 3.0.

So, +1!

- milind


On 11/14/11 2:11 PM, "Arun C Murthy"  wrote:

>Folks,
>
>Apache Hadoop has come a long way since our humble beginnings. As a
>community we've made significant progress, even in 2011 - we've had 3
>releases off the branch-0.20-security (0.20.205 being the latest) and we
>just released 0.23.0 last week, our first major release off trunk in a
>while.
>
>With hadoop-0.20.205 we finally have an Apache release with both security
>and HBase support, both critical for the growing ecosystem.
>
>With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker
>has something we've wanted for a while and I think it's time for us to
>just ship it. Linus did something similar with GNU/Linux 3.0.
>
>Yes, we could add more features or better it along many dimensions (ala
>hadoop-0.23), but right now we have a pretty decent piece of software
>i.e.  the feature set in hadoop-0.20.205 is compelling and widely used.
>We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a
>community, can support compatibility in the hadoop-1.x series, which is
>the essential ingredient. This isn't a brand new idea, Doug suggested
>this a long while ago.
>
>Thoughts?
>
>thanks,
>Arun
>
>
>
>



Re: [DISCUSS] Apache Hadoop 1.0?

2011-11-14 Thread Doug Cutting
To be specific, I think one of the possible could be sensible:

A. Rename as follows:

  0.20 -> 1.0
  0.21 -> 1.1
  0.22 -> 1.2
  0.23 -> 2.0
  0.24 -> 2.1

B. Just drop the leading zero, e.g., 0.23.0 becomes 23.0.

Doug

On 11/14/2011 02:11 PM, Arun C Murthy wrote:
> Folks,
> 
> Apache Hadoop has come a long way since our humble beginnings. As a community 
> we've made significant progress, even in 2011 - we've had 3 releases off the 
> branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 
> last week, our first major release off trunk in a while.
> 
> With hadoop-0.20.205 we finally have an Apache release with both security and 
> HBase support, both critical for the growing ecosystem.
> 
> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has 
> something we've wanted for a while and I think it's time for us to just ship 
> it. Linus did something similar with GNU/Linux 3.0. 
> 
> Yes, we could add more features or better it along many dimensions (ala 
> hadoop-0.23), but right now we have a pretty decent piece of software i.e.  
> the feature set in hadoop-0.20.205 is compelling and widely used. We could 
> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can 
> support compatibility in the hadoop-1.x series, which is the essential 
> ingredient. This isn't a brand new idea, Doug suggested this a long while 
> ago. 
> 
> Thoughts?
> 
> thanks,
> Arun
> 
> 
>