Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Raymie Stata
To summarize the thread so far:

a) Java7 is already a supported compile- and runtime environment for
Hadoop branch2 and trunk
b) Java6 must remain a supported compile- and runtime environment for
Hadoop branch2
c) (b) implies that branch2 must stick to Java6 APIs

I wonder if point (b) should be revised.  We could immediately
deprecate Java6 as a runtime (and thus compile-time) environment for
Hadoop.  We could end support for in some published time frame
(perhaps 3Q2014).  That is, we'd say that all future 2.x release past
some date would not be guaranteed to run on Java6.  This would set us
up for using Java7 APIs into branch2.

An alternative might be to keep branch2 on Java6 APIs forever, and to
start using Java7 APIs in trunk relatively soon.  The concern here
would be that trunk isn't getting the kind of production torture
testing that branch2 is subjected to, and won't be for a while.  If
trunk and branch2 diverge too much too quickly, trunk could become a
nest of bugs, endangering the timeline and quality of Hadoop 3.  This
would argue for keeping trunk and branch2 in closer sync (maybe until
a branch3 is created and starts getting used by bleeding-edge users).
However, as just suggested, keeping them in closer sync need _not_
imply that Java7 features be avoided indefinitely: again, with
sufficient warning, Java6 support could be sunset within branch2.

On a related note, Steve points out that we need to start thinking
about Java8.  YES!!  Lambdas are a Really Big Deal!  If we sunset
Java6 in a few quarters, maybe we can add Java8 compile and runtime
(but not API) support about the same time.  This does NOT imply
bringing Java8 APIs into branch2: Even if we do allow Java7 APIs into
branch2 in the future, I doubt that bringing Java8 APIs into it will
ever make sense.  However, if Java8 is a supported runtime environment
for Hadoop 2, that sets us up for using Java8 APIs for the eventual
branch3 sometime in 2015.


On Sat, Apr 5, 2014 at 10:52 AM, Steve Loughran  wrote:
> On 5 April 2014 11:53, Colin McCabe  wrote:
>
>> I've been using JDK7 for Hadoop development for a while now, and I
>> know a lot of other folks have as well.  Correct me if I'm wrong, but
>> what we're talking about here is not "moving towards JDK7" but
>> "breaking compatibility with JDK6."
>>
>
> +1
>
>>
>> There are a lot of good reasons to ditch JDK6.  It would let us use
>> new APIs in JDK7, especially the new file APIs.  It would let us
>> update a few dependencies to newer versions.
>>
>>
> +1
>
>
>
>> I don't like the idea of breaking compatibility with JDK6 in trunk,
>> but not in branch-2.  The traditional reason for putting something in
>> trunk but not in branch-2 is that it is new code that needs some time
>> to prove itself.
>
>
> +1. branch-2 must continue to run on JDK6
>
>
>> This doesn't really apply to incrementing min.jdk--
>> we could do that easily whenever we like.  Meanwhile, if trunk starts
>> accumulating jdk7-only code and dependencies, backports from trunk to
>> branch-2 will become harder and harder over time.
>>
>
>
> I agree, but note that trunk diverges from branch-2 over time anyway -it's
> happening.
>
>
>>
>> Since we make stable releases off of branch-2, and not trunk, I don't
>> see any upside to this.  To be honest, I see only negatives here.
>> More time backporting, more issues that show up only in production
>> (branch-2) and not on dev machines (trunk).
>>
>
>> Maybe it's time to start thinking about what version of branch-2 will
>> drop jdk6 support.  But until there is such a version, I don't think
>> trunk should do it.
>>
>
>
>
>1. Let's assume that branch-2 will never drop JDK6 -clusters are
>committed to it, and saying "JDK updated needed" will simply stop updates.
>2. By the hadoop 3.0 ships -2015?- JDK6 will be EOL, java 8 will be in
>common use, and even JDK7 seen as trailing edge.
>3. JDK7  improves JVM performance: NUMA, nativeIO &c -which you get for
>free -as we're confident its stable there's no reason to not move to it in
>production.
>4. As we update the dependencies on hadoop 3, we'll end up upgrading to
>libraries that are JDK7+ only (jetty!), so JDK6 is implicitly abandoned.
>5. There are new packages and APIs in Java7 which we can adopt to make
>our lives better and development more productive -as well as improving the
>user experience.
>
> as a case in point, java.io.File.mkdirs() says "true if and only if the
> directory was created; false otherwise " , and returns false in either of
> the two cases:
>  -the path resolves to a directory that exists
>  -the path resolves to a file
> think about that, anyone using local filesystems could write code that
> assumes that mkdir()==0 is harmless, because if you apply it more than once
> on a directory it is. But call it on a file and you don't get told its only
> a file until you try to do something under it, and then things stop
> behaving.
>
> In comparison, java.nio.files.F

Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Steve Loughran
On 5 April 2014 11:53, Colin McCabe  wrote:

> I've been using JDK7 for Hadoop development for a while now, and I
> know a lot of other folks have as well.  Correct me if I'm wrong, but
> what we're talking about here is not "moving towards JDK7" but
> "breaking compatibility with JDK6."
>

+1

>
> There are a lot of good reasons to ditch JDK6.  It would let us use
> new APIs in JDK7, especially the new file APIs.  It would let us
> update a few dependencies to newer versions.
>
>
+1



> I don't like the idea of breaking compatibility with JDK6 in trunk,
> but not in branch-2.  The traditional reason for putting something in
> trunk but not in branch-2 is that it is new code that needs some time
> to prove itself.


+1. branch-2 must continue to run on JDK6


> This doesn't really apply to incrementing min.jdk--
> we could do that easily whenever we like.  Meanwhile, if trunk starts
> accumulating jdk7-only code and dependencies, backports from trunk to
> branch-2 will become harder and harder over time.
>


I agree, but note that trunk diverges from branch-2 over time anyway -it's
happening.


>
> Since we make stable releases off of branch-2, and not trunk, I don't
> see any upside to this.  To be honest, I see only negatives here.
> More time backporting, more issues that show up only in production
> (branch-2) and not on dev machines (trunk).
>

> Maybe it's time to start thinking about what version of branch-2 will
> drop jdk6 support.  But until there is such a version, I don't think
> trunk should do it.
>



   1. Let's assume that branch-2 will never drop JDK6 -clusters are
   committed to it, and saying "JDK updated needed" will simply stop updates.
   2. By the hadoop 3.0 ships -2015?- JDK6 will be EOL, java 8 will be in
   common use, and even JDK7 seen as trailing edge.
   3. JDK7  improves JVM performance: NUMA, nativeIO &c -which you get for
   free -as we're confident its stable there's no reason to not move to it in
   production.
   4. As we update the dependencies on hadoop 3, we'll end up upgrading to
   libraries that are JDK7+ only (jetty!), so JDK6 is implicitly abandoned.
   5. There are new packages and APIs in Java7 which we can adopt to make
   our lives better and development more productive -as well as improving the
   user experience.

as a case in point, java.io.File.mkdirs() says "true if and only if the
directory was created; false otherwise " , and returns false in either of
the two cases:
 -the path resolves to a directory that exists
 -the path resolves to a file
think about that, anyone using local filesystems could write code that
assumes that mkdir()==0 is harmless, because if you apply it more than once
on a directory it is. But call it on a file and you don't get told its only
a file until you try to do something under it, and then things stop
behaving.

In comparison, java.nio.files.Files differentiates this case by declaring
"FileAlreadyExistsException - if dir exists but is not a directory". Which
is the kind of thing that would make RawLocalFS behave a lot more like
HDFS. Similarly, if we could switch to Files.moveTo(), then the destination
file would stop being overwritten if it existed, so RawLocalFS's rename()
semantics would come closer to HDFS.

These are things we just can't do while retaining Java 6 compatibility.
-and why I am looking forward to the time when we can stop caring about
Java7.

Now, assuming that Hadoop 3.x will be Java7+ only, we have the option
between now and its future ship date to move to those Java7 APIs. So when
to make the move?

   1. It can be done late -in which case few changes will happen, nobody
   sees much benefit.
   2. We can do it now, and have 12+ months to adopt the new features, make
   the move -and be set up for Java 8 migration in later versions.

Yes, code that uses the new APIs won't work on Java6, but that doesn't mean
it shouldn't happen Hadoop made the jump from Java 5 to Java 6 after all.

-Steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Colin McCabe
I've been using JDK7 for Hadoop development for a while now, and I
know a lot of other folks have as well.  Correct me if I'm wrong, but
what we're talking about here is not "moving towards JDK7" but
"breaking compatibility with JDK6."

There are a lot of good reasons to ditch JDK6.  It would let us use
new APIs in JDK7, especially the new file APIs.  It would let us
update a few dependencies to newer versions.

I don't like the idea of breaking compatibility with JDK6 in trunk,
but not in branch-2.  The traditional reason for putting something in
trunk but not in branch-2 is that it is new code that needs some time
to prove itself.  This doesn't really apply to incrementing min.jdk--
we could do that easily whenever we like.  Meanwhile, if trunk starts
accumulating jdk7-only code and dependencies, backports from trunk to
branch-2 will become harder and harder over time.

Since we make stable releases off of branch-2, and not trunk, I don't
see any upside to this.  To be honest, I see only negatives here.
More time backporting, more issues that show up only in production
(branch-2) and not on dev machines (trunk).

Maybe it's time to start thinking about what version of branch-2 will
drop jdk6 support.  But until there is such a version, I don't think
trunk should do it.

best,
Colin


On Fri, Apr 4, 2014 at 3:15 PM, Haohui Mai  wrote:
> I'm referring to the later case. Indeed migrating JDK7 for branch-2 is more
> difficult.
>
> I think one reasonable approach is to put the hdfs / yarn clients into
> separate jars. The client-side jars can only use JDK6 APIs, so that
> downstream projects running on top of JDK6 continue to work.
>
> The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
> use JDK7 APIs inside them. Given the fact that there're way more code in
> the server-side compared to the client-side, having the ability to use JDK7
> in the server-side only might still be a win.
>
> The downside I can think of is that it might complicate the effort of
> publishing maven jars, but this should be an one-time issue.
>
> ~Haohui
>
>
> On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur wrote:
>
>> Haohui,
>>
>> Is the idea to compile/test with JDK7 and recommend it for runtime and stop
>> there? Or to start using JDK7 API stuff as well? If the later is the case,
>> then backporting stuff to branch-2 may break and patches may have to be
>> refactored for JDK6. Given that branch-2 got GA status not so long ago, I
>> assume it will be active for a while.
>>
>> What are your thoughts on this regard?
>>
>> Thanks
>>
>>
>> On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai  wrote:
>>
>> > Hi,
>> >
>> > There have been multiple discussions on deprecating supports of JDK6 and
>> > moving towards JDK7. It looks to me that the consensus is that now hadoop
>> > is ready to drop the support of JDK6 and to move towards JDK7. Based on
>> the
>> > consensus, I wonder whether it is a good time to start the migration.
>> >
>> > Here are my understandings of the current status:
>> >
>> > 1. There is no more public updates of JDK6 since Feb 2013. Users no
>> longer
>> > get fixes of security vulnerabilities through official public updates.
>> > 2. Hadoop core is stuck with out-of-date dependency unless moving towards
>> > JDK7. (see
>> > http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
>> > The implementation can also benefit from it thanks to the new
>> > functionalities in JDK7.
>> > 3. The code is ready for JDK7. Cloudera and Hortonworks have successful
>> > stories of supporting Hadoop on JDK7.
>> >
>> >
>> > It seems that the real work of moving to JDK7 is minimal. We only need to
>> > (1) make sure the jenkins are running on top of JDK7, and (2) to update
>> the
>> > minimum required Java version from 6 to 7. Therefore I propose that let's
>> > move towards JDK7 in trunk in the short term.
>> >
>> > Your feedbacks are appreciated.
>> >
>> > Regards,
>> > Haohui
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>> >
>>
>>
>>
>> --
>> Alejandro
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemina

Build failed in Jenkins: Hadoop-Common-trunk #1090

2014-04-05 Thread Apache Jenkins Server
See 

Changes:

[acmurthy] YARN-1898. Addendum patch to ensure /jmx and /metrics are 
re-directed to Active RM.

[kihwal] HDFS-6159. TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails 
if there is block missing after balancer success. Contributed by Chen He.

[jianhe] YARN-1837. Fixed TestMoveApplication#testMoveRejectedByScheduler 
failure. Contributed by Hong Zhiguo

[szetszwo] Commit the hadoop-common part of HDFS-6189.

[szetszwo] HDFS-6189. Multiple HDFS tests fail on Windows attempting to use a 
test root path containing a colon.  Contributed by cnauroth

[cnauroth] HADOOP-10456. Bug in Configuration.java exposed by Spark 
(ConcurrentModificationException). Contributed by Nishkam Ravi.

[umamahesh] HADOOP-10462. DF#getFilesystem is not parsing the command output. 
Contributed by Akira AJISAKA.

--
[...truncated 61840 lines...]
Adding reference: maven.local.repository
[DEBUG] Initialize Maven Ant Tasks
parsing buildfile 
jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml
 with URI = 
jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml
 from a zip file
parsing buildfile 
jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml
 with URI = 
jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml
 from a zip file
Class org.apache.maven.ant.tasks.AttachArtifactTask loaded from parent loader 
(parentFirst)
 +Datatype attachartifact org.apache.maven.ant.tasks.AttachArtifactTask
Class org.apache.maven.ant.tasks.DependencyFilesetsTask loaded from parent 
loader (parentFirst)
 +Datatype dependencyfilesets org.apache.maven.ant.tasks.DependencyFilesetsTask
Setting project property: test.build.dir -> 

Setting project property: test.exclude.pattern -> _
Setting project property: hadoop.assemblies.version -> 3.0.0-SNAPSHOT
Setting project property: test.exclude -> _
Setting project property: distMgmtSnapshotsId -> apache.snapshots.https
Setting project property: project.build.sourceEncoding -> UTF-8
Setting project property: java.security.egd -> file:///dev/urandom
Setting project property: distMgmtSnapshotsUrl -> 
https://repository.apache.org/content/repositories/snapshots
Setting project property: distMgmtStagingUrl -> 
https://repository.apache.org/service/local/staging/deploy/maven2
Setting project property: avro.version -> 1.7.4
Setting project property: test.build.data -> 

Setting project property: commons-daemon.version -> 1.0.13
Setting project property: hadoop.common.build.dir -> 

Setting project property: testsThreadCount -> 4
Setting project property: maven.test.redirectTestOutputToFile -> true
Setting project property: jdiff.version -> 1.0.9
Setting project property: build.platform -> Linux-i386-32
Setting project property: project.reporting.outputEncoding -> UTF-8
Setting project property: distMgmtStagingName -> Apache Release Distribution 
Repository
Setting project property: protobuf.version -> 2.5.0
Setting project property: failIfNoTests -> false
Setting project property: protoc.path -> ${env.HADOOP_PROTOC_PATH}
Setting project property: jersey.version -> 1.9
Setting project property: distMgmtStagingId -> apache.staging.https
Setting project property: distMgmtSnapshotsName -> Apache Development Snapshot 
Repository
Setting project property: ant.file -> 

[DEBUG] Setting properties with prefix: 
Setting project property: project.groupId -> org.apache.hadoop
Setting project property: project.artifactId -> hadoop-common-project
Setting project property: project.name -> Apache Hadoop Common Project
Setting project property: project.description -> Apache Hadoop Common Project
Setting project property: project.version -> 3.0.0-SNAPSHOT
Setting project property: project.packaging -> pom
Setting project property: project.build.directory -> 

Setting project property: project.build.outputDirectory -> 

Setting project property: project.build.testOutputDirectory -> 

Setting project property: proje