Re: Compatibility guidelines for toString overrides

2016-05-12 Thread Raymie Stata
How about this idea:

Define a new annotation "StableImplUnstableInterface" which means consumers 
can't assume stability but producers can't change things. Mark all toStrings 
with this annotation. 

Then, in a lazy fashion, as the need arises to change various toString methods, 
diligence can be done first to see if any legacy code depends on a method in a 
compatibility-breaking manner, those dependencies can be fixed, then the method 
is changed and remarked as unstable.  Conversely, there might be circumstances 
where a toString method might be marked as stable.  (Certainly it's reasonable 
to assume the Integer.toString returns a parsable result, for example, the 
point being that for some classes it makes sense to have a stable spec for 
toString).  

Over the years one would hope that the StableImplUnstableSpec annotations would 
disappear. 

Sent from my iPhone

> On May 12, 2016, at 1:40 PM, Sean Busbey  wrote:
> 
> As a downstream user of Hadoop, it would be much clearer if the
> toString functions included the appropriate annotations to say they're
> non-public, evolving, or whatever.
> 
> Most downstream users of Hadoop aren't going to remember in-detail
> exceptions to the java API compatibility rules, once they see that a
> class is labeled Public/Stable, they're going to presume that applies
> to all non-private members.
> 
>> On Thu, May 12, 2016 at 9:32 AM, Colin McCabe  wrote:
>> Hi all,
>> 
>> Recently a discussion came up on HADOOP-13028 about the wisdom of
>> overloading S3AInputStream#toString to output statistics information.
>> It's a difficult judgement for me to make, since I'm not aware of any
>> compatibility guidelines for InputStream#toString.  Do we have
>> compatibility guidelines for toString functions?
>> 
>> It seems like the output of toString functions is usually used as a
>> debugging aid, rather than as a stable format suitable for UI display or
>> object serialization.  Clearly, there are a few cases where we might
>> want to specifically declare that a toString method is a stable API.
>> However, I think if we attempt to treat the toString output of all
>> public classes as stable, we will have greatly increased the API
>> surface.  Should we formalize this and declare that toString functions
>> are @Unstable, Evolving unless declared otherwise?
>> 
>> best,
>> Colin
>> 
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 
> 
> 
> -- 
> busbey
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-11806) Test issue for JIRA automation scripts

2015-04-03 Thread Raymie Stata (JIRA)
Raymie Stata created HADOOP-11806:
-

 Summary: Test issue for JIRA automation scripts
 Key: HADOOP-11806
 URL: https://issues.apache.org/jira/browse/HADOOP-11806
 Project: Hadoop Common
  Issue Type: Test
Reporter: Raymie Stata
Assignee: Raymie Stata
Priority: Trivial


I'm writing some scripts to automate some JIRA clean-up activities.  I've 
created this issue for testing these scripts.  Please ignore...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2015-03-09 Thread Raymie Stata
Avoiding the use of JDK8 language features (and, presumably, APIs)
means you've abandoned #1, i.e., you haven't (really) bumped the JDK
source version to JDK8.

Also, note that releasing from trunk is a way of achieving #3, it's
not a way of abandoning it.



On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Hi Raymie,

 Konst proposed just releasing off of trunk rather than cutting a branch-2,
 and there was general agreement there. So, consider #3 abandoned. 12 can
 be achieved at the same time, we just need to avoid using JDK8 language
 features in trunk so things can be backported.

 Best,
 Andrew

 On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote:

 In this (and the related threads), I see the following three requirements:

 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

 2. We'll still be releasing 2.x releases for a while, with similar
 feature sets as 3.x.

 3. Avoid the risk of split-brain behavior by minimize backporting
 headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
 Adding a branch-3, branch-3.x would be obnoxious.

 These three cannot be achieved at the same time.  Which do we abandon?


 On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com
 wrote:
 
  On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:
 
  2) Simplification of configs - potentially separating client side
 configs
  and those used by daemons. This is another source of perpetual confusion
  for users.
  + 1 on this.
 
  sanjay



Re: Looking to a Hadoop 3 release

2015-03-09 Thread Raymie Stata
In this (and the related threads), I see the following three requirements:

1. Bump the source JDK version to JDK8 (ie, drop JDK7 support).

2. We'll still be releasing 2.x releases for a while, with similar
feature sets as 3.x.

3. Avoid the risk of split-brain behavior by minimize backporting
headaches. Pulling trunk  branch-2  branch-2.x is already tedious.
Adding a branch-3, branch-3.x would be obnoxious.

These three cannot be achieved at the same time.  Which do we abandon?


On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote:

 On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote:

 2) Simplification of configs - potentially separating client side configs
 and those used by daemons. This is another source of perpetual confusion
 for users.
 + 1 on this.

 sanjay


Re: Plans of moving towards JDK7 in trunk

2014-04-13 Thread Raymie Stata
There's an outstanding question addressed to me: Are there particular
features or new dependencies that you would like to contribute (or see
contributed) that require using the Java 1.7 APIs?  The question
misses the point: We'd figure out how to write something we wanted to
contribute to Hadoop against the APIs of Java4 if that's what it took
to get them into a stable release.  And at current course and speed,
that's how ridiculous things could get.

To summarize, it seems like there's a vague consensus that it might be
okay to eventually allow the use of Java7 in trunk, but there's no
decision.  And there's been no answer to the concern that even if such
dependencies were allowed in Java7, the only people using them would
be people who uninterested in getting their patches into a stable
release of Hadoop on any knowable timeframe, which doesn't bode well
for the ability to stabilize that Java7 code when it comes time to
attempt to.

I don't have more to add, so I'll go back to lurking.  It'll be
interesting to see where we'll be standing a year from now.

On Sun, Apr 13, 2014 at 2:09 AM, Tsuyoshi OZAWA
ozawa.tsuyo...@gmail.com wrote:
 Hi,

 +1 for Karthik's idea(non-binding).

 IMO, we should keep the compatibility between JDK 6 and JDK 7 on both branch-1
 and branch-2, because users can be using them. For future releases that we can
 declare breaking compatibility(e.g. 3.0.0 release), we can use JDK 7
 features if we
 can get benefits. However, it can increase maintenance costs and distributes 
 the
 efforts of contributions to maintain branches. Then, I think it is
 reasonable approach
 that we use limited and minimum JDK-7 APIs when we have reasons we need to use
 the features.
 By the way, if we start to use JDK 7 APIs, we should declare the basis
 when to use
 JDK 7 APIs on Wiki not to confuse contributors.

 Thanks,
 - Tsuyoshi

 On Wed, Apr 9, 2014 at 11:44 AM, Raymie Stata rst...@altiscale.com wrote:
 It might make sense to try to enumerate the benefits of switching to
 Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 It might make sense to try to enumerate the benefits of switching to Java7
 APIs and dependencies.  IMO, the ones listed so far on this thread don't
 make a compelling enough case to drop Java6 in branch-2 on any time frame,
 even if this means supporting Java6 through 2015.  For example, the change
 in RawLocalFileSystem semantics might be an incompatible change for
 branch-2 any way.


 On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
  davi.ottenhei...@emc.com wrote:
   From: Eli Collins [mailto:e...@cloudera.com]
   Sent: Monday, April 07, 2014 11:54 AM
  
  
   IMO we should not drop support for Java 6 in a minor update of a
 stable
   release (v2).  I don't think the larger Hadoop user base would find it
   acceptable that upgrading to a minor update caused their systems to
 stop
   working because they didn't upgrade Java. There are people still
 getting
   support for Java 6. ...
  
   Thanks,
   Eli
  
   Hi Eli,
  
   Technically you are correct those with extended support get critical
  security fixes for 6 until the end of 2016. I am curious whether many of
  those are in the Hadoop user base. Do you know? My guess is the vast
  majority are within Oracle's official public end of life, which was over
 12
  months ago. Even Premier support ended Dec 2013:
  
   http://www.oracle.com/technetwork/java/eol-135779.html
  
   The end of Java 6 support carries much risk. It has to be considered in
  terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
  score 10.0.
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
   Since you mentioned caused systems

Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Raymie Stata
I think the problem to be solved here is to define a point in time
when the average Hadoop contributor can start using Java7 dependencies
in their code.

The use Java7 dependencies in trunk(/branch3) plan, by itself, does
not solve this problem.  The average Hadoop contributor wants to see
their contributions make it into a stable release in a predictable
amount of time.  Putting code with a Java7 dependency into trunk means
the exact opposite: there is no timeline to a stable release.  So most
contributors will stay away from Java7 dependencies, despite the
nominal policy that they're allowed in trunk.  (And the few that do
use Java7 dependencies are people who do not value releasing code into
stable releases, which arguably could lead to a situation that the
Java7-dependent code in trunk is, on average, on the buggy side.)

I'm not saying the branch2-in-the-future plan is the only way to
solve the problem of putting Java7 dependencies on a known time-table,
but at least it solves it.  Is there another solution?

On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote:
 On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:



 For the sake of this discussion we should separate the runtime from
 the programming APIs. Users are already migrating to the java7 runtime
 for most of the reasons listed below (support, performance, bugs,
 etc), and the various distributions cert their Hadoop 2 based
 distributions on java7.  This gives users many of the benefits of
 java7, without forcing users off java6. Ie Hadoop does not need to
 switch to the java7 programming APIs to make sure everyone has a
 supported runtime.


 +1: you can use Java 7 today; I'm not sure how tested Java 8 is


 The question here is really about when Hadoop, and the Hadoop
 ecosystem (since adjacent projects often end up in the same classpath)
 start using the java7 programming APIs and therefore break
 compatibility with java6 runtimes. I think our java6 runtime users
 would consider dropping support for their java runtime in an update of
 a major release to be an incompatible change (the binaries stop
 working on their current jvm).


 do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


 That may be worth it if we can
 articulate sufficient value to offset the cost (they have to upgrade
 their environment, might make rolling upgrades stop working, etc), but
 I've not yet heard an argument that articulates the value relative to
 the cost.  Eg upgrading to the java7 APIs allows us to pull in
 dependencies with new major versions, but only if those dependencies
 don't break compatibility (which is likely given that our classpaths
 aren't so isolated), and, realistically, only if the entire Hadoop
 stack moves to java7 as well




 (eg we have to recompile HBase to
 generate v1.7 binaries even if they stick on API v1.6). I'm not aware
 of a feature, bug etc that really motivates this.

 I don't see that being needed unless we move up to new java7+ only
 libraries and HBase needs to track this.

  The big recompile to work issue is google guava, which is troublesome
 enough I'd be tempted to say can we drop it entirely



 An alternate approach is to keep the current stable release series
 (v2.x) as is, and start using new APIs in trunk (for v3). This will be
 a major upgrade for Hadoop and therefore an incompatible change like
 this is to be expected (it would be great if this came with additional
 changes to better isolate classpaths and dependencies from each
 other). It allows us to continue to support multiple types of users
 with different branches, vs forcing all users onto a new version. It
 of course means that 2.x users will not get the benefits of the new
 API, but its unclear what those benefits are given theIy can already
 get the benefits of adopting the newer java runtimes today.



 I'm (personally) +1 to this, I also think we should plan to do the switch
 some time this year to not only get the benefits, but discover the costs

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Raymie Stata
Is there broad consensus that, by end of 3Q2014 at the latest, the
average contributor to Hadoop should be free to use Java7 features?
And start pulling in libraries that have a Java7 dependency?  And
start doing the janitorial work of taking advantage of the Java7
APIs?  Or do we think that the bulk of Hadoop work will be done
against Java6 APIs (and avoiding Java7-dependent libraries) through
the end of the year?

If the consensus is that we introduce Java7 into the bulk of Hadoop
coding, what's the plan for getting there?  The answer can't be right
now, in trunk.  Even if we agreed to start allowing Java7
dependencies into trunk, as a practical matter this isn't enough.
Right now, if I'm a random Hadoop contributor, I'd be stupid to
contribute to trunk: I know that any stable release in the near term
will be from branch2, so if I want a prayer of seeing my change in a
stable release, I'd better contribute to branch2.

If we want a path to allowing Java7 dependencies by Q4, then we need
one of the following:

1) branch3 plan: The major Hadoop vendors (you know who you are)
commit to shipping a v3 of Hadoop in Q4 that allows Java7
dependencies and show signs of living up to that commitment (e.g., a
branch3 is created sometime soon).  This puts us all on a path towards
a real release of Hadoop that allows Java7 dependencies.

2) branch2 plan: deprecate Java6 as a runtime environment now,
publicly declare a time frame (e.g., 4Q2014) when _future development_
stops supporting Java6 runtime, and work with our customers in the
meantime to get them off a crazy-old version of Java (that's what
we're doing right now).

I don't see another path to allowing Java7 dependencies.  In the
current state of indecision, the smart programmer would be assuming no
Java7 dependencies into 2015.

On the one hand, I don't see the branch3 plan actually happening.
This is a big decision involving marketing, engineering, customer
support.  Plus it creates a problem for sales: Come summertime,
they'll have a hard time selling 2.x-based releases because they've
pre-announced support for 3.x.  It's just not going to happen.

On the other hand, I don't see the problem with the branch2 plan.  The
branch2 plan also requires the commitment from the major vendors, but
this decision is not nearly as galactic.  By the time 3Q2014 comes
along, this problem will be very rarified.  Also, don't forget that it
typically takes a customer 3-6 months to upgrade their Hadoop -- and a
customer who's afraid to shift off Java6 in 3Q2014 will probably take
a year to upgrade.  The branch2 plan implies a last Java6 release of
Hadoop in 3Q2014.  If we assume a Java7-averse customer will take a
year to upgrade to this release -- and then will take another year to
upgrade their cluster after that -- then they can be happily using
Java6 all the way into 2016.  (Another point, if 3Q2014 comes along
and vendors find they have so many customers still on Java6 that they
can't afford the discontinuity, then they can shift their MAJOR
version number of their product to communicate the discontinuity --
there's nothing that says that a vendor's versioning scheme must agree
exactly with Hadoop's.)

In short, we don't currently have a realistic path for introducing
Java7 dependencies into Hadoop.  Simply allowing them into trunk will
NOT solve this problem: any contributor who wants to see their code in
a stable release knows it'll have to flow through branch2 -- and thus
they'll have to avoid Java6 dependencies.  The branch2 plan is the
only plan proposed so far that gets us to Java7 dependencies by Q4.
And the important part of the branch2 plan is we make the decision
soon -- so we have time to notify folks and otherwise work that
decision out into the field.

  Raymie



On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:
 On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
 davi.ottenhei...@emc.com wrote:
 From: Eli Collins [mailto:e...@cloudera.com]
 Sent: Monday, April 07, 2014 11:54 AM


 IMO we should not drop support for Java 6 in a minor update of a stable
 release (v2).  I don't think the larger Hadoop user base would find it
 acceptable that upgrading to a minor update caused their systems to stop
 working because they didn't upgrade Java. There are people still getting
 support for Java 6. ...

 Thanks,
 Eli

 Hi Eli,

 Technically you are correct those with extended support get critical 
 security fixes for 6 until the end of 2016. I am curious whether many of 
 those are in the Hadoop user base. Do you know? My guess is the vast 
 majority are within Oracle's official public end of life, which was over 12 
 months ago. Even Premier support ended Dec 2013:

 http://www.oracle.com/technetwork/java/eol-135779.html

 The end of Java 6 support carries much risk. It has to be considered in 
 terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS 
 score 10.0.

 http://www.cvedetails.com/cve/CVE-2013-2465/

 Since you 

Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Raymie Stata
 It might make sense to try to enumerate the benefits of switching to
 Java7 APIs and dependencies.

  - Java7 introduced a huge number of language, byte-code, API, and
tooling enhancements!  Just to name a few: try-with-resources, newer
and stronger encyrption methods, more scalable concurrency primitives.
 See http://www.slideshare.net/boulderjug/55-things-in-java-7

  - We can't update current dependencies, and we can't add cool new ones.

  - Putting language/APIs aside, don't forget that a huge amount of effort
goes into qualifying for Java6 (at least, I hope the folks claiming to
support Java6 are putting in such an effort :-).  Wouldn't Hadoop
users/customers be better served if qualification effort went into
Java7/8 versus Java6/7?

Getting to Java7 as a development env (and Java8 as a runtime env)
seems like a no-brainer.  Question is: How?

On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 It might make sense to try to enumerate the benefits of switching to Java7
 APIs and dependencies.  IMO, the ones listed so far on this thread don't
 make a compelling enough case to drop Java6 in branch-2 on any time frame,
 even if this means supporting Java6 through 2015.  For example, the change
 in RawLocalFileSystem semantics might be an incompatible change for
 branch-2 any way.


 On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
  davi.ottenhei...@emc.com wrote:
   From: Eli Collins [mailto:e...@cloudera.com]
   Sent: Monday, April 07, 2014 11:54 AM
  
  
   IMO we should not drop support for Java 6 in a minor update of a
 stable
   release (v2).  I don't think the larger Hadoop user base would find it
   acceptable that upgrading to a minor update caused their systems to
 stop
   working because they didn't upgrade Java. There are people still
 getting
   support for Java 6. ...
  
   Thanks,
   Eli
  
   Hi Eli,
  
   Technically you are correct those with extended support get critical
  security fixes for 6 until the end of 2016. I am curious whether many of
  those are in the Hadoop user base. Do you know? My guess is the vast
  majority are within Oracle's official public end of life, which was over
 12
  months ago. Even Premier support ended Dec 2013:
  
   http://www.oracle.com/technetwork/java/eol-135779.html
  
   The end of Java 6 support carries much risk. It has to be considered in
  terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
  score 10.0.
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
   Since you mentioned caused systems to stop as an example of what
 would
  be a concern to Hadoop users, please note the CVE-2013-2465 availability
  impact:
  
   Complete (There is a total shutdown of the affected resource. The
  attacker can render the resource completely unavailable.)
  
   This vulnerability was patched in Java 6 Update 51, but post end of
  life. Apple pushed out the update specifically because of this
  vulnerability (http://support.apple.com/kb/HT5717) as did some other
  vendors privately, but for the majority of people using Java 6 means they
  have a ticking time bomb.
  
   Allowing it to stay should be considered in terms of accepting the
 whole
  risk posture.
  
 
  There are some who get extended support, but I suspect many just have
  a if-it's-not-broke mentality when it comes to production deployments.
  The current code supports both java6 and java7 and so allows these
  people to remain compatible, while enabling others to upgrade to the
  java7 runtime. This seems like the right compromise for a stable
  release series. Again, absolutely makes sense for trunk (ie v3) to
  require java7 or greater.
 



Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Raymie Stata
To summarize the thread so far:

a) Java7 is already a supported compile- and runtime environment for
Hadoop branch2 and trunk
b) Java6 must remain a supported compile- and runtime environment for
Hadoop branch2
c) (b) implies that branch2 must stick to Java6 APIs

I wonder if point (b) should be revised.  We could immediately
deprecate Java6 as a runtime (and thus compile-time) environment for
Hadoop.  We could end support for in some published time frame
(perhaps 3Q2014).  That is, we'd say that all future 2.x release past
some date would not be guaranteed to run on Java6.  This would set us
up for using Java7 APIs into branch2.

An alternative might be to keep branch2 on Java6 APIs forever, and to
start using Java7 APIs in trunk relatively soon.  The concern here
would be that trunk isn't getting the kind of production torture
testing that branch2 is subjected to, and won't be for a while.  If
trunk and branch2 diverge too much too quickly, trunk could become a
nest of bugs, endangering the timeline and quality of Hadoop 3.  This
would argue for keeping trunk and branch2 in closer sync (maybe until
a branch3 is created and starts getting used by bleeding-edge users).
However, as just suggested, keeping them in closer sync need _not_
imply that Java7 features be avoided indefinitely: again, with
sufficient warning, Java6 support could be sunset within branch2.

On a related note, Steve points out that we need to start thinking
about Java8.  YES!!  Lambdas are a Really Big Deal!  If we sunset
Java6 in a few quarters, maybe we can add Java8 compile and runtime
(but not API) support about the same time.  This does NOT imply
bringing Java8 APIs into branch2: Even if we do allow Java7 APIs into
branch2 in the future, I doubt that bringing Java8 APIs into it will
ever make sense.  However, if Java8 is a supported runtime environment
for Hadoop 2, that sets us up for using Java8 APIs for the eventual
branch3 sometime in 2015.


On Sat, Apr 5, 2014 at 10:52 AM, Steve Loughran ste...@hortonworks.com wrote:
 On 5 April 2014 11:53, Colin McCabe cmcc...@alumni.cmu.edu wrote:

 I've been using JDK7 for Hadoop development for a while now, and I
 know a lot of other folks have as well.  Correct me if I'm wrong, but
 what we're talking about here is not moving towards JDK7 but
 breaking compatibility with JDK6.


 +1


 There are a lot of good reasons to ditch JDK6.  It would let us use
 new APIs in JDK7, especially the new file APIs.  It would let us
 update a few dependencies to newer versions.


 +1



 I don't like the idea of breaking compatibility with JDK6 in trunk,
 but not in branch-2.  The traditional reason for putting something in
 trunk but not in branch-2 is that it is new code that needs some time
 to prove itself.


 +1. branch-2 must continue to run on JDK6


 This doesn't really apply to incrementing min.jdk--
 we could do that easily whenever we like.  Meanwhile, if trunk starts
 accumulating jdk7-only code and dependencies, backports from trunk to
 branch-2 will become harder and harder over time.



 I agree, but note that trunk diverges from branch-2 over time anyway -it's
 happening.



 Since we make stable releases off of branch-2, and not trunk, I don't
 see any upside to this.  To be honest, I see only negatives here.
 More time backporting, more issues that show up only in production
 (branch-2) and not on dev machines (trunk).


 Maybe it's time to start thinking about what version of branch-2 will
 drop jdk6 support.  But until there is such a version, I don't think
 trunk should do it.




1. Let's assume that branch-2 will never drop JDK6 -clusters are
committed to it, and saying JDK updated needed will simply stop updates.
2. By the hadoop 3.0 ships -2015?- JDK6 will be EOL, java 8 will be in
common use, and even JDK7 seen as trailing edge.
3. JDK7  improves JVM performance: NUMA, nativeIO c -which you get for
free -as we're confident its stable there's no reason to not move to it in
production.
4. As we update the dependencies on hadoop 3, we'll end up upgrading to
libraries that are JDK7+ only (jetty!), so JDK6 is implicitly abandoned.
5. There are new packages and APIs in Java7 which we can adopt to make
our lives better and development more productive -as well as improving the
user experience.

 as a case in point, java.io.File.mkdirs() says true if and only if the
 directory was created; false otherwise  , and returns false in either of
 the two cases:
  -the path resolves to a directory that exists
  -the path resolves to a file
 think about that, anyone using local filesystems could write code that
 assumes that mkdir()==0 is harmless, because if you apply it more than once
 on a directory it is. But call it on a file and you don't get told its only
 a file until you try to do something under it, and then things stop
 behaving.

 In comparison, java.nio.files.Files differentiates this case by declaring
 FileAlreadyExistsException - 

Re: Will there be a 2.2 patch releases?

2014-01-04 Thread Raymie Stata
I took a look at items in 2.3 and 2.4, as well as CDH5 and HDP2 (also
looked at a few of the patches to assess their risk levels), and came
up with the following strawman propose of bug-patches to be included
in a 2.2.1 release:

HADOOP-10029 [major] - Specifying har file to MR job fails in secure cluster

HDFS-5089 [major] - When a LayoutVersion supports SNAPSHOT, it must
support FSIMAGE_NAME_OPTIMIZATION
HDFS-5403 [major] - WebHdfs client cannot communicate with older
WebHdfs servers post HDFS-5306
HDFS-5433 [critical] - When reloading fsimage during checkpointing, we
should clear existing snapshottable directories

MAPREDUCE-5028 [critical] - Maps fail when io.sort.mb is set to high value

YARN-1295 [major] - In UnixLocalWrapperScriptBuilder, using bash -c
can cause Text file busy errors
YARN-1374 [blocker] - Resource Manager fails to start due to
ConcurrentModificationException
YARN-1176 [critical] - RM web services ClusterMetricsInfo total nodes
doesn't include unhealthy nodes

There are lots of outstanding bug fixes, so this list is definitely a
bit arbitrary, but it seemed like a good list to me.  Any thoughts?


On Fri, Jan 3, 2014 at 5:26 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
 Re-reading the thread, it seems what I said about 2.2.1 never happening was
 incorrect.  My impression is still that nobody has plans to drive a 2.2.1
 release on any particular timeline.

 The changes that are now in 2.3 have been moved out of the branch-2.2.1.  I
 suppose the idea is that changes slated for 2.2.1 should be committed both
 to branch-2.2 and branch-2.2.1.

 -Sandy


 On Fri, Jan 3, 2014 at 4:57 PM, Raymie Stata rst...@altiscale.com wrote:

 Yes, that thread is part of what's confusing me.  Arun's initial 11/8
 message suggests that there would be room for blocker fixes leading to
 a 2.2.1 patch release (...and then be very careful about including
 only *blocker* fixes in branch-2.2).  And nothing else in that thread
 suggests that there wouldn't be a patch release.  And yet, Sandy seems
 to think that 2.2.1 isn't happening at all (YARN-1295), a view
 that's consistent with the currently confused state of the repo
 (branch-2.2.1 exists but not released, branch-2.2 version is
 2.2.2-SNAPSHOT).

 Seems to me that we should be planning for a 2.2.1 patch release at
 some point...

   Raymie



 On Fri, Jan 3, 2014 at 1:17 AM, Steve Loughran ste...@hortonworks.com
 wrote:
  the last discussion on this was in november -I presume that's still the
 plan
 
 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201311.mbox/%3CA31E1430-33BE-437C-A61E-050F9A67C109%40hortonworks.com%3E
 
 
  On 3 January 2014 04:10, Raymie Stata rst...@altiscale.com wrote:
 
  Nudge, any thoughts?
 
  On Sun, Dec 29, 2013 at 1:25 AM, Raymie Stata rst...@altiscale.com
  wrote:
   In discussing YARN-1295 it's become clear that I'm confused about the
   outcome of the Next releases thread.  I had assumed there would be
   patch releases to 2.2, and indeed one would be coming out early Q1.
   Is this correct?
  
   If so, then things seem a little messed-up right now in 2.2-land.
   There already is a branch-2.2.1, but there hasn't been a release.  And
   branch-2.2 has Maven version 2.2.2-SNAPSHOT.  Due to the 2.3 rename
   a few weeks ago, it might be that the first patch release for 2.2
   needs to be 2.2.2.  But if so, notice these lists of fixes for 2.2.1:
  
 https://issues.apache.org/jira/browse/YARN/fixforversion/12325667
 https://issues.apache.org/jira/browse/HDFS/fixforversion/12325666
  
   Do these need to have their fix-versions updated?
  
 Raymie
  
  
   P.S. While we're on the subject of point releases, let me check my
  assumptions.
  
   I assumed that, for release x.y.z, fixes deemed to be critical bug
   fixes would be put into branch-x.y as a matter of course.  The Maven
   release-number in branch-x.y would be x.y.(z+1)-SNAPSHOT, and JIRAs
   (to be) committed to branch-x.y would have x.y.(z+1) as one of their
   fix-versions.
  
   When enough fixes have accumulated to warrant a release, or when a fix
   comes up that is critical enough to warrant an immediate release, then
   branch-x-y is branched to branch-x.y.(z+1), and a release is made.
  
   (As Hadoop itself moves from x.y to x.(y+1) and then x.(y+2), the
   threshold for what is considered to be a critical bug would
   naturally start to rise, as the effort of back-porting goes up.)
  
   Do I have it right?
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please

Re: Will there be a 2.2 patch releases?

2014-01-03 Thread Raymie Stata
Yes, that thread is part of what's confusing me.  Arun's initial 11/8
message suggests that there would be room for blocker fixes leading to
a 2.2.1 patch release (...and then be very careful about including
only *blocker* fixes in branch-2.2).  And nothing else in that thread
suggests that there wouldn't be a patch release.  And yet, Sandy seems
to think that 2.2.1 isn't happening at all (YARN-1295), a view
that's consistent with the currently confused state of the repo
(branch-2.2.1 exists but not released, branch-2.2 version is
2.2.2-SNAPSHOT).

Seems to me that we should be planning for a 2.2.1 patch release at
some point...

  Raymie



On Fri, Jan 3, 2014 at 1:17 AM, Steve Loughran ste...@hortonworks.com wrote:
 the last discussion on this was in november -I presume that's still the plan

 http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201311.mbox/%3CA31E1430-33BE-437C-A61E-050F9A67C109%40hortonworks.com%3E


 On 3 January 2014 04:10, Raymie Stata rst...@altiscale.com wrote:

 Nudge, any thoughts?

 On Sun, Dec 29, 2013 at 1:25 AM, Raymie Stata rst...@altiscale.com
 wrote:
  In discussing YARN-1295 it's become clear that I'm confused about the
  outcome of the Next releases thread.  I had assumed there would be
  patch releases to 2.2, and indeed one would be coming out early Q1.
  Is this correct?
 
  If so, then things seem a little messed-up right now in 2.2-land.
  There already is a branch-2.2.1, but there hasn't been a release.  And
  branch-2.2 has Maven version 2.2.2-SNAPSHOT.  Due to the 2.3 rename
  a few weeks ago, it might be that the first patch release for 2.2
  needs to be 2.2.2.  But if so, notice these lists of fixes for 2.2.1:
 
https://issues.apache.org/jira/browse/YARN/fixforversion/12325667
https://issues.apache.org/jira/browse/HDFS/fixforversion/12325666
 
  Do these need to have their fix-versions updated?
 
Raymie
 
 
  P.S. While we're on the subject of point releases, let me check my
 assumptions.
 
  I assumed that, for release x.y.z, fixes deemed to be critical bug
  fixes would be put into branch-x.y as a matter of course.  The Maven
  release-number in branch-x.y would be x.y.(z+1)-SNAPSHOT, and JIRAs
  (to be) committed to branch-x.y would have x.y.(z+1) as one of their
  fix-versions.
 
  When enough fixes have accumulated to warrant a release, or when a fix
  comes up that is critical enough to warrant an immediate release, then
  branch-x-y is branched to branch-x.y.(z+1), and a release is made.
 
  (As Hadoop itself moves from x.y to x.(y+1) and then x.(y+2), the
  threshold for what is considered to be a critical bug would
  naturally start to rise, as the effort of back-porting goes up.)
 
  Do I have it right?


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Will there be a 2.2 patch releases?

2013-12-29 Thread Raymie Stata
In discussing YARN-1295 it's become clear that I'm confused about the
outcome of the Next releases thread.  I had assumed there would be
patch releases to 2.2, and indeed one would be coming out early Q1.
Is this correct?

If so, then things seem a little messed-up right now in 2.2-land.
There already is a branch-2.2.1, but there hasn't been a release.  And
branch-2.2 has Maven version 2.2.2-SNAPSHOT.  Due to the 2.3 rename
a few weeks ago, it might be that the first patch release for 2.2
needs to be 2.2.2.  But if so, notice these lists of fixes for 2.2.1:

  https://issues.apache.org/jira/browse/YARN/fixforversion/12325667
  https://issues.apache.org/jira/browse/HDFS/fixforversion/12325666

Do these need to have their fix-versions updated?

  Raymie


P.S. While we're on the subject of point releases, let me check my assumptions.

I assumed that, for release x.y.z, fixes deemed to be critical bug
fixes would be put into branch-x.y as a matter of course.  The Maven
release-number in branch-x.y would be x.y.(z+1)-SNAPSHOT, and JIRAs
(to be) committed to branch-x.y would have x.y.(z+1) as one of their
fix-versions.

When enough fixes have accumulated to warrant a release, or when a fix
comes up that is critical enough to warrant an immediate release, then
branch-x-y is branched to branch-x.y.(z+1), and a release is made.

(As Hadoop itself moves from x.y to x.(y+1) and then x.(y+2), the
threshold for what is considered to be a critical bug would
naturally start to rise, as the effort of back-porting goes up.)

Do I have it right?