Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Steve Loughran
On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:



 For the sake of this discussion we should separate the runtime from
 the programming APIs. Users are already migrating to the java7 runtime
 for most of the reasons listed below (support, performance, bugs,
 etc), and the various distributions cert their Hadoop 2 based
 distributions on java7.  This gives users many of the benefits of
 java7, without forcing users off java6. Ie Hadoop does not need to
 switch to the java7 programming APIs to make sure everyone has a
 supported runtime.


+1: you can use Java 7 today; I'm not sure how tested Java 8 is


 The question here is really about when Hadoop, and the Hadoop
 ecosystem (since adjacent projects often end up in the same classpath)
 start using the java7 programming APIs and therefore break
 compatibility with java6 runtimes. I think our java6 runtime users
 would consider dropping support for their java runtime in an update of
 a major release to be an incompatible change (the binaries stop
 working on their current jvm).


do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


 That may be worth it if we can
 articulate sufficient value to offset the cost (they have to upgrade
 their environment, might make rolling upgrades stop working, etc), but
 I've not yet heard an argument that articulates the value relative to
 the cost.  Eg upgrading to the java7 APIs allows us to pull in
 dependencies with new major versions, but only if those dependencies
 don't break compatibility (which is likely given that our classpaths
 aren't so isolated), and, realistically, only if the entire Hadoop
 stack moves to java7 as well




 (eg we have to recompile HBase to
 generate v1.7 binaries even if they stick on API v1.6). I'm not aware
 of a feature, bug etc that really motivates this.

 I don't see that being needed unless we move up to new java7+ only
libraries and HBase needs to track this.

 The big recompile to work issue is google guava, which is troublesome
enough I'd be tempted to say can we drop it entirely



 An alternate approach is to keep the current stable release series
 (v2.x) as is, and start using new APIs in trunk (for v3). This will be
 a major upgrade for Hadoop and therefore an incompatible change like
 this is to be expected (it would be great if this came with additional
 changes to better isolate classpaths and dependencies from each
 other). It allows us to continue to support multiple types of users
 with different branches, vs forcing all users onto a new version. It
 of course means that 2.x users will not get the benefits of the new
 API, but its unclear what those benefits are given theIy can already
 get the benefits of adopting the newer java runtimes today.



I'm (personally) +1 to this, I also think we should plan to do the switch
some time this year to not only get the benefits, but discover the costs

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Steve Loughran
If we're thinking of future progress, here's a little low-level one: adopt
SLF4J as the API for logging


   1. its the new defacto standard of logging APIs
   2. its a lot better than commons-logging with on demand Inline string
   expansion of varags arguments.
   3. we already ship it, as jetty uses it
   4. we already depend on it, client-side and server-side in the
   hadoop-auth package
   5. it lets people log via logback if they want to. That's client-side,
   even if the server stays on log4j
   6. It's way faster than using String.format()


The best initial thing about SL4FJ is how it only expands its arguments
string values if needed

  LOG.debug(Initialized, principal [{}] from keytab [{}], principal,
keytab);

not logging at debug? No need to test first. That alone saves code and
improves readability.

The slf4 expansion code handles null values as well as calling toString()
on non-null arguments. Oh and it does arrays too.

 int array = [1, 2, 3];
 String undef = null;

 LOG.info(a = {}, u = {}, array, undef)  - a = [1, 2, 3],  u = null

Switching to SLF4J from commons-logging is as trivial as changing the type
of the logger created, but with one logger per class that does get
expensive in terms of change. Moving to SLF4J across the board would be a
big piece of work -but doable.

Rather than push for a dramatic change why not adopt a policy of demanding
it in new maven subprojects? hadoop-auth shows we permit it, so why not say
you MUST?

Once people have experience in using it, and are happy, then we could think
about switching to the new APIs in the core modules. The only troublespot
there is where code calls getLogger() on the commons log to get at the
log4j appender -there's ~3 places in production code that does this, 200+
in tests -tests that do it to turn back log levels. Those tests can stay
with commons-logging, same for the production uses. Mixing commons-logging
and slf4j isn't drastic -they both route to log4j or a.n.other back end.

-Stevve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Build failed in Jenkins: Hadoop-Common-trunk #1095

2014-04-10 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-trunk/1095/changes

Changes:

[tucu] HADOOP-10428. JavaKeyStoreProvider should accept keystore password via 
configuration falling back to ENV VAR. (tucu)

[vinodkv] YARN-1910. Fixed a race condition in TestAMRMTokens that causes the 
test to fail more often on Windows. Contributed by Xuan Gong.

[wheat9] HDFS-6225. Remove the o.a.h.hdfs.server.common.UpgradeStatusReport. 
Contributed by Haohui Mai.

[cnauroth] HDFS-6208. DataNode caching can leak file descriptors. Contributed 
by Chris Nauroth.

[wheat9] HDFS-6170. Support GETFILESTATUS operation in WebImageViewer. 
Contributed by Akira Ajisaka.

[tucu] HADOOP-10429. KeyStores should have methods to generate the materials 
themselves, KeyShell should use them. (tucu)

[tucu] HADOOP-10427. KeyProvider implementations should be thread safe. (tucu)

[tucu] HADOOP-10432. Refactor SSLFactory to expose static method to determine 
HostnameVerifier. (tucu)

[szetszwo] HDFS-6209. TestValidateConfigurationSettings should use random 
ports.  Contributed by Arpit Agarwal

[wheat9] HADOOP-10485. Remove dead classes in hadoop-streaming. Contributed by 
Haohui Mai.

[szetszwo] HDFS-6204. Fix TestRBWBlockInvalidation: change the last sleep to a 
loop.

[szetszwo] HDFS-6206. Fix NullPointerException in 
DFSUtil.substituteForWildcardAddress.

[szetszwo] HADOOP-10473. TestCallQueueManager should interrupt before counting 
calls.

[jeagles] HDFS-6215. Wrong error message for upgrade. (Kihwal Lee via jeagles)

[arp] HDFS-6160. TestSafeMode occasionally fails. (Contributed by Arpit Agarwal)

[kihwal] YARN-1907. TestRMApplicationHistoryWriter#testRMWritingMassiveHistory 
intermittently fails. Contributed by Mit Desai.

[stevel] HADOOP-10104. Update jackson to 1.9.13 (Akira Ajisaka via stevel)

--
[...truncated 60536 lines...]
Adding reference: maven.local.repository
[DEBUG] Initialize Maven Ant Tasks
parsing buildfile 
jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml
 with URI = 
jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml
 from a zip file
parsing buildfile 
jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml
 with URI = 
jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml
 from a zip file
Class org.apache.maven.ant.tasks.AttachArtifactTask loaded from parent loader 
(parentFirst)
 +Datatype attachartifact org.apache.maven.ant.tasks.AttachArtifactTask
Class org.apache.maven.ant.tasks.DependencyFilesetsTask loaded from parent 
loader (parentFirst)
 +Datatype dependencyfilesets org.apache.maven.ant.tasks.DependencyFilesetsTask
Setting project property: test.build.dir - 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/target/test-dir
Setting project property: test.exclude.pattern - _
Setting project property: hadoop.assemblies.version - 3.0.0-SNAPSHOT
Setting project property: test.exclude - _
Setting project property: distMgmtSnapshotsId - apache.snapshots.https
Setting project property: project.build.sourceEncoding - UTF-8
Setting project property: java.security.egd - file:///dev/urandom
Setting project property: distMgmtSnapshotsUrl - 
https://repository.apache.org/content/repositories/snapshots
Setting project property: distMgmtStagingUrl - 
https://repository.apache.org/service/local/staging/deploy/maven2
Setting project property: avro.version - 1.7.4
Setting project property: test.build.data - 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/target/test-dir
Setting project property: commons-daemon.version - 1.0.13
Setting project property: hadoop.common.build.dir - 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/../../hadoop-common-project/hadoop-common/target
Setting project property: testsThreadCount - 4
Setting project property: maven.test.redirectTestOutputToFile - true
Setting project property: jdiff.version - 1.0.9
Setting project property: build.platform - Linux-i386-32
Setting project property: project.reporting.outputEncoding - UTF-8
Setting project property: distMgmtStagingName - Apache Release Distribution 
Repository
Setting project property: protobuf.version - 2.5.0
Setting project property: failIfNoTests - false
Setting project property: protoc.path - ${env.HADOOP_PROTOC_PATH}
Setting project property: jersey.version - 1.9
Setting project property: distMgmtStagingId - apache.staging.https
Setting project property: distMgmtSnapshotsName - Apache Development Snapshot 
Repository
Setting project property: ant.file - 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml
[DEBUG] Setting properties with prefix: 
Setting 

Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Jay Vyas
Slf4j is definetly a great step forward.  Log4j is restrictive for complex and 
multi tenant apps like hadoop.

Also the fact that slf4j doesn't use any magic when binding to its log provider 
makes it way easier to swap out its implementation then tools of the past.

 On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com wrote:
 
 If we're thinking of future progress, here's a little low-level one: adopt
 SLF4J as the API for logging
 
 
   1. its the new defacto standard of logging APIs
   2. its a lot better than commons-logging with on demand Inline string
   expansion of varags arguments.
   3. we already ship it, as jetty uses it
   4. we already depend on it, client-side and server-side in the
   hadoop-auth package
   5. it lets people log via logback if they want to. That's client-side,
   even if the server stays on log4j
   6. It's way faster than using String.format()
 
 
 The best initial thing about SL4FJ is how it only expands its arguments
 string values if needed
 
  LOG.debug(Initialized, principal [{}] from keytab [{}], principal,
 keytab);
 
 not logging at debug? No need to test first. That alone saves code and
 improves readability.
 
 The slf4 expansion code handles null values as well as calling toString()
 on non-null arguments. Oh and it does arrays too.
 
 int array = [1, 2, 3];
 String undef = null;
 
 LOG.info(a = {}, u = {}, array, undef)  - a = [1, 2, 3],  u = null
 
 Switching to SLF4J from commons-logging is as trivial as changing the type
 of the logger created, but with one logger per class that does get
 expensive in terms of change. Moving to SLF4J across the board would be a
 big piece of work -but doable.
 
 Rather than push for a dramatic change why not adopt a policy of demanding
 it in new maven subprojects? hadoop-auth shows we permit it, so why not say
 you MUST?
 
 Once people have experience in using it, and are happy, then we could think
 about switching to the new APIs in the core modules. The only troublespot
 there is where code calls getLogger() on the commons log to get at the
 log4j appender -there's ~3 places in production code that does this, 200+
 in tests -tests that do it to turn back log levels. Those tests can stay
 with commons-logging, same for the production uses. Mixing commons-logging
 and slf4j isn't drastic -they both route to log4j or a.n.other back end.
 
 -Stevve
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Raymie Stata
I think the problem to be solved here is to define a point in time
when the average Hadoop contributor can start using Java7 dependencies
in their code.

The use Java7 dependencies in trunk(/branch3) plan, by itself, does
not solve this problem.  The average Hadoop contributor wants to see
their contributions make it into a stable release in a predictable
amount of time.  Putting code with a Java7 dependency into trunk means
the exact opposite: there is no timeline to a stable release.  So most
contributors will stay away from Java7 dependencies, despite the
nominal policy that they're allowed in trunk.  (And the few that do
use Java7 dependencies are people who do not value releasing code into
stable releases, which arguably could lead to a situation that the
Java7-dependent code in trunk is, on average, on the buggy side.)

I'm not saying the branch2-in-the-future plan is the only way to
solve the problem of putting Java7 dependencies on a known time-table,
but at least it solves it.  Is there another solution?

On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote:
 On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:



 For the sake of this discussion we should separate the runtime from
 the programming APIs. Users are already migrating to the java7 runtime
 for most of the reasons listed below (support, performance, bugs,
 etc), and the various distributions cert their Hadoop 2 based
 distributions on java7.  This gives users many of the benefits of
 java7, without forcing users off java6. Ie Hadoop does not need to
 switch to the java7 programming APIs to make sure everyone has a
 supported runtime.


 +1: you can use Java 7 today; I'm not sure how tested Java 8 is


 The question here is really about when Hadoop, and the Hadoop
 ecosystem (since adjacent projects often end up in the same classpath)
 start using the java7 programming APIs and therefore break
 compatibility with java6 runtimes. I think our java6 runtime users
 would consider dropping support for their java runtime in an update of
 a major release to be an incompatible change (the binaries stop
 working on their current jvm).


 do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


 That may be worth it if we can
 articulate sufficient value to offset the cost (they have to upgrade
 their environment, might make rolling upgrades stop working, etc), but
 I've not yet heard an argument that articulates the value relative to
 the cost.  Eg upgrading to the java7 APIs allows us to pull in
 dependencies with new major versions, but only if those dependencies
 don't break compatibility (which is likely given that our classpaths
 aren't so isolated), and, realistically, only if the entire Hadoop
 stack moves to java7 as well




 (eg we have to recompile HBase to
 generate v1.7 binaries even if they stick on API v1.6). I'm not aware
 of a feature, bug etc that really motivates this.

 I don't see that being needed unless we move up to new java7+ only
 libraries and HBase needs to track this.

  The big recompile to work issue is google guava, which is troublesome
 enough I'd be tempted to say can we drop it entirely



 An alternate approach is to keep the current stable release series
 (v2.x) as is, and start using new APIs in trunk (for v3). This will be
 a major upgrade for Hadoop and therefore an incompatible change like
 this is to be expected (it would be great if this came with additional
 changes to better isolate classpaths and dependencies from each
 other). It allows us to continue to support multiple types of users
 with different branches, vs forcing all users onto a new version. It
 of course means that 2.x users will not get the benefits of the new
 API, but its unclear what those benefits are given theIy can already
 get the benefits of adopting the newer java runtimes today.



 I'm (personally) +1 to this, I also think we should plan to do the switch
 some time this year to not only get the benefits, but discover the costs

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


[jira] [Resolved] (HADOOP-10382) Add Apache Tez to the Hadoop homepage as a related project

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved HADOOP-10382.


Resolution: Fixed

I just committed this.

 Add Apache Tez to the Hadoop homepage as a related project
 --

 Key: HADOOP-10382
 URL: https://issues.apache.org/jira/browse/HADOOP-10382
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: HADOOP-10382.patch, HADOOP-10382.patch


 Add Apache Tez to the Hadoop homepage as a related project



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Andrew Wang
+1 from me, it'd be lovely to get rid of all those isDebugEnabled checks.


On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas jayunit...@gmail.com wrote:

 Slf4j is definetly a great step forward.  Log4j is restrictive for complex
 and multi tenant apps like hadoop.

 Also the fact that slf4j doesn't use any magic when binding to its log
 provider makes it way easier to swap out its implementation then tools of
 the past.

  On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
  If we're thinking of future progress, here's a little low-level one:
 adopt
  SLF4J as the API for logging
 
 
1. its the new defacto standard of logging APIs
2. its a lot better than commons-logging with on demand Inline string
expansion of varags arguments.
3. we already ship it, as jetty uses it
4. we already depend on it, client-side and server-side in the
hadoop-auth package
5. it lets people log via logback if they want to. That's client-side,
even if the server stays on log4j
6. It's way faster than using String.format()
 
 
  The best initial thing about SL4FJ is how it only expands its arguments
  string values if needed
 
   LOG.debug(Initialized, principal [{}] from keytab [{}], principal,
  keytab);
 
  not logging at debug? No need to test first. That alone saves code and
  improves readability.
 
  The slf4 expansion code handles null values as well as calling toString()
  on non-null arguments. Oh and it does arrays too.
 
  int array = [1, 2, 3];
  String undef = null;
 
  LOG.info(a = {}, u = {}, array, undef)  - a = [1, 2, 3],  u = null
 
  Switching to SLF4J from commons-logging is as trivial as changing the
 type
  of the logger created, but with one logger per class that does get
  expensive in terms of change. Moving to SLF4J across the board would be a
  big piece of work -but doable.
 
  Rather than push for a dramatic change why not adopt a policy of
 demanding
  it in new maven subprojects? hadoop-auth shows we permit it, so why not
 say
  you MUST?
 
  Once people have experience in using it, and are happy, then we could
 think
  about switching to the new APIs in the core modules. The only troublespot
  there is where code calls getLogger() on the commons log to get at the
  log4j appender -there's ~3 places in production code that does this, 200+
  in tests -tests that do it to turn back log levels. Those tests can stay
  with commons-logging, same for the production uses. Mixing
 commons-logging
  and slf4j isn't drastic -they both route to log4j or a.n.other back end.
 
  -Stevve
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.



Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Alejandro Abdelnur
+1 pn slf4j. 

one thing Jay, the issues with log4j will still be there as log4j will still be 
under the hood. 

thx

Alejandro
(phone typing)

 On Apr 10, 2014, at 7:35, Andrew Wang andrew.w...@cloudera.com wrote:
 
 +1 from me, it'd be lovely to get rid of all those isDebugEnabled checks.
 
 
 On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas jayunit...@gmail.com wrote:
 
 Slf4j is definetly a great step forward.  Log4j is restrictive for complex
 and multi tenant apps like hadoop.
 
 Also the fact that slf4j doesn't use any magic when binding to its log
 provider makes it way easier to swap out its implementation then tools of
 the past.
 
 On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 If we're thinking of future progress, here's a little low-level one:
 adopt
 SLF4J as the API for logging
 
 
  1. its the new defacto standard of logging APIs
  2. its a lot better than commons-logging with on demand Inline string
  expansion of varags arguments.
  3. we already ship it, as jetty uses it
  4. we already depend on it, client-side and server-side in the
  hadoop-auth package
  5. it lets people log via logback if they want to. That's client-side,
  even if the server stays on log4j
  6. It's way faster than using String.format()
 
 
 The best initial thing about SL4FJ is how it only expands its arguments
 string values if needed
 
 LOG.debug(Initialized, principal [{}] from keytab [{}], principal,
 keytab);
 
 not logging at debug? No need to test first. That alone saves code and
 improves readability.
 
 The slf4 expansion code handles null values as well as calling toString()
 on non-null arguments. Oh and it does arrays too.
 
 int array = [1, 2, 3];
 String undef = null;
 
 LOG.info(a = {}, u = {}, array, undef)  - a = [1, 2, 3],  u = null
 
 Switching to SLF4J from commons-logging is as trivial as changing the
 type
 of the logger created, but with one logger per class that does get
 expensive in terms of change. Moving to SLF4J across the board would be a
 big piece of work -but doable.
 
 Rather than push for a dramatic change why not adopt a policy of
 demanding
 it in new maven subprojects? hadoop-auth shows we permit it, so why not
 say
 you MUST?
 
 Once people have experience in using it, and are happy, then we could
 think
 about switching to the new APIs in the core modules. The only troublespot
 there is where code calls getLogger() on the commons log to get at the
 log4j appender -there's ~3 places in production code that does this, 200+
 in tests -tests that do it to turn back log levels. Those tests can stay
 with commons-logging, same for the production uses. Mixing
 commons-logging
 and slf4j isn't drastic -they both route to log4j or a.n.other back end.
 
 -Stevve
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 


Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Karthik Kambatla
+1 to use slf4j. I would actually vote for (1) new modules must-use, (2)
new classes in existing modules are strongly recommended to use, (3)
existing classes can switch to. That would take us closer to using slf4j
everywhere faster.


On Thu, Apr 10, 2014 at 8:17 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 +1 pn slf4j.

 one thing Jay, the issues with log4j will still be there as log4j will
 still be under the hood.

 thx

 Alejandro
 (phone typing)

  On Apr 10, 2014, at 7:35, Andrew Wang andrew.w...@cloudera.com wrote:
 
  +1 from me, it'd be lovely to get rid of all those isDebugEnabled checks.
 
 
  On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas jayunit...@gmail.com wrote:
 
  Slf4j is definetly a great step forward.  Log4j is restrictive for
 complex
  and multi tenant apps like hadoop.
 
  Also the fact that slf4j doesn't use any magic when binding to its log
  provider makes it way easier to swap out its implementation then tools
 of
  the past.
 
  On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com
  wrote:
 
  If we're thinking of future progress, here's a little low-level one:
  adopt
  SLF4J as the API for logging
 
 
   1. its the new defacto standard of logging APIs
   2. its a lot better than commons-logging with on demand Inline string
   expansion of varags arguments.
   3. we already ship it, as jetty uses it
   4. we already depend on it, client-side and server-side in the
   hadoop-auth package
   5. it lets people log via logback if they want to. That's client-side,
   even if the server stays on log4j
   6. It's way faster than using String.format()
 
 
  The best initial thing about SL4FJ is how it only expands its arguments
  string values if needed
 
  LOG.debug(Initialized, principal [{}] from keytab [{}],
 principal,
  keytab);
 
  not logging at debug? No need to test first. That alone saves code and
  improves readability.
 
  The slf4 expansion code handles null values as well as calling
 toString()
  on non-null arguments. Oh and it does arrays too.
 
  int array = [1, 2, 3];
  String undef = null;
 
  LOG.info(a = {}, u = {}, array, undef)  - a = [1, 2, 3],  u = null
 
  Switching to SLF4J from commons-logging is as trivial as changing the
  type
  of the logger created, but with one logger per class that does get
  expensive in terms of change. Moving to SLF4J across the board would
 be a
  big piece of work -but doable.
 
  Rather than push for a dramatic change why not adopt a policy of
  demanding
  it in new maven subprojects? hadoop-auth shows we permit it, so why not
  say
  you MUST?
 
  Once people have experience in using it, and are happy, then we could
  think
  about switching to the new APIs in the core modules. The only
 troublespot
  there is where code calls getLogger() on the commons log to get at the
  log4j appender -there's ~3 places in production code that does this,
 200+
  in tests -tests that do it to turn back log levels. Those tests can
 stay
  with commons-logging, same for the production uses. Mixing
  commons-logging
  and slf4j isn't drastic -they both route to log4j or a.n.other back
 end.
 
  -Stevve
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity
  to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 



Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:

 
 
  For the sake of this discussion we should separate the runtime from
  the programming APIs. Users are already migrating to the java7 runtime
  for most of the reasons listed below (support, performance, bugs,
  etc), and the various distributions cert their Hadoop 2 based
  distributions on java7.  This gives users many of the benefits of
  java7, without forcing users off java6. Ie Hadoop does not need to
  switch to the java7 programming APIs to make sure everyone has a
  supported runtime.
 
 
 +1: you can use Java 7 today; I'm not sure how tested Java 8 is


  The question here is really about when Hadoop, and the Hadoop
  ecosystem (since adjacent projects often end up in the same classpath)
  start using the java7 programming APIs and therefore break
  compatibility with java6 runtimes. I think our java6 runtime users
  would consider dropping support for their java runtime in an update of
  a major release to be an incompatible change (the binaries stop
  working on their current jvm).


 do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


I mean 2.x -- 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to 2.5.




  That may be worth it if we can
  articulate sufficient value to offset the cost (they have to upgrade
  their environment, might make rolling upgrades stop working, etc), but
  I've not yet heard an argument that articulates the value relative to
  the cost.  Eg upgrading to the java7 APIs allows us to pull in
  dependencies with new major versions, but only if those dependencies
  don't break compatibility (which is likely given that our classpaths
  aren't so isolated), and, realistically, only if the entire Hadoop
  stack moves to java7 as well




  (eg we have to recompile HBase to
  generate v1.7 binaries even if they stick on API v1.6). I'm not aware
  of a feature, bug etc that really motivates this.
 
  I don't see that being needed unless we move up to new java7+ only
 libraries and HBase needs to track this.

  The big recompile to work issue is google guava, which is troublesome
 enough I'd be tempted to say can we drop it entirely



  An alternate approach is to keep the current stable release series
  (v2.x) as is, and start using new APIs in trunk (for v3). This will be
  a major upgrade for Hadoop and therefore an incompatible change like
  this is to be expected (it would be great if this came with additional
  changes to better isolate classpaths and dependencies from each
  other). It allows us to continue to support multiple types of users
  with different branches, vs forcing all users onto a new version. It
  of course means that 2.x users will not get the benefits of the new
  API, but its unclear what those benefits are given theIy can already
  get the benefits of adopting the newer java runtimes today.
 
 
 
 I'm (personally) +1 to this, I also think we should plan to do the switch
 some time this year to not only get the benefits, but discover the costs



Agree



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 6:49 AM, Raymie Stata rst...@altiscale.com wrote:

 I think the problem to be solved here is to define a point in time
 when the average Hadoop contributor can start using Java7 dependencies
 in their code.

 The use Java7 dependencies in trunk(/branch3) plan, by itself, does
 not solve this problem.  The average Hadoop contributor wants to see
 their contributions make it into a stable release in a predictable
 amount of time.  Putting code with a Java7 dependency into trunk means
 the exact opposite: there is no timeline to a stable release.  So most
 contributors will stay away from Java7 dependencies, despite the
 nominal policy that they're allowed in trunk.  (And the few that do
 use Java7 dependencies are people who do not value releasing code into
 stable releases, which arguably could lead to a situation that the
 Java7-dependent code in trunk is, on average, on the buggy side.)

 I'm not saying the branch2-in-the-future plan is the only way to
 solve the problem of putting Java7 dependencies on a known time-table,
 but at least it solves it.  Is there another solution?


All good reasons for why we should start thinking about a plan for v3. The
points above pertain to any features for trunk that break compatibility,
not just ones that use new Java APIs.  We shouldn't permit incompatible
changes to merge to v2 just because we don't yet have a timeline for v3, we
should figure out the latter. Also motivates finishing the work to isolate
dependencies between Hadoop code, other framework code, and user code.

Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see contributed) that
require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
release are both non-trivial, not something I suspect we'd want to do just
because it would be, for example, nicer to have a newer version of Jetty.

Thanks,
Eli







 On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com
 wrote:
  On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:
 
 
 
  For the sake of this discussion we should separate the runtime from
  the programming APIs. Users are already migrating to the java7 runtime
  for most of the reasons listed below (support, performance, bugs,
  etc), and the various distributions cert their Hadoop 2 based
  distributions on java7.  This gives users many of the benefits of
  java7, without forcing users off java6. Ie Hadoop does not need to
  switch to the java7 programming APIs to make sure everyone has a
  supported runtime.
 
 
  +1: you can use Java 7 today; I'm not sure how tested Java 8 is
 
 
  The question here is really about when Hadoop, and the Hadoop
  ecosystem (since adjacent projects often end up in the same classpath)
  start using the java7 programming APIs and therefore break
  compatibility with java6 runtimes. I think our java6 runtime users
  would consider dropping support for their java runtime in an update of
  a major release to be an incompatible change (the binaries stop
  working on their current jvm).
 
 
  do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?
 
 
  That may be worth it if we can
  articulate sufficient value to offset the cost (they have to upgrade
  their environment, might make rolling upgrades stop working, etc), but
  I've not yet heard an argument that articulates the value relative to
  the cost.  Eg upgrading to the java7 APIs allows us to pull in
  dependencies with new major versions, but only if those dependencies
  don't break compatibility (which is likely given that our classpaths
  aren't so isolated), and, realistically, only if the entire Hadoop
  stack moves to java7 as well
 
 
 
 
  (eg we have to recompile HBase to
  generate v1.7 binaries even if they stick on API v1.6). I'm not aware
  of a feature, bug etc that really motivates this.
 
  I don't see that being needed unless we move up to new java7+ only
  libraries and HBase needs to track this.
 
   The big recompile to work issue is google guava, which is troublesome
  enough I'd be tempted to say can we drop it entirely
 
 
 
  An alternate approach is to keep the current stable release series
  (v2.x) as is, and start using new APIs in trunk (for v3). This will be
  a major upgrade for Hadoop and therefore an incompatible change like
  this is to be expected (it would be great if this came with additional
  changes to better isolate classpaths and dependencies from each
  other). It allows us to continue to support multiple types of users
  with different branches, vs forcing all users onto a new version. It
  of course means that 2.x users will not get the benefits of the new
  API, but its unclear what those benefits are given theIy can already
  get the benefits of adopting the newer java runtimes today.
 
 
 
  I'm (personally) +1 to this, I also think we should plan to do the switch
  some time this year to not only get the benefits, but discover the 

Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Alejandro Abdelnur
A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

Thanks.



On Thu, Apr 10, 2014 at 10:04 AM, Eli Collins e...@cloudera.com wrote:

 On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com
 wrote:

  On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:
 
  
  
   For the sake of this discussion we should separate the runtime from
   the programming APIs. Users are already migrating to the java7 runtime
   for most of the reasons listed below (support, performance, bugs,
   etc), and the various distributions cert their Hadoop 2 based
   distributions on java7.  This gives users many of the benefits of
   java7, without forcing users off java6. Ie Hadoop does not need to
   switch to the java7 programming APIs to make sure everyone has a
   supported runtime.
  
  
  +1: you can use Java 7 today; I'm not sure how tested Java 8 is
 
 
   The question here is really about when Hadoop, and the Hadoop
   ecosystem (since adjacent projects often end up in the same classpath)
   start using the java7 programming APIs and therefore break
   compatibility with java6 runtimes. I think our java6 runtime users
   would consider dropping support for their java runtime in an update of
   a major release to be an incompatible change (the binaries stop
   working on their current jvm).
 
 
  do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?
 

 I mean 2.x -- 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to
 2.5.


 
 
   That may be worth it if we can
   articulate sufficient value to offset the cost (they have to upgrade
   their environment, might make rolling upgrades stop working, etc), but
   I've not yet heard an argument that articulates the value relative to
   the cost.  Eg upgrading to the java7 APIs allows us to pull in
   dependencies with new major versions, but only if those dependencies
   don't break compatibility (which is likely given that our classpaths
   aren't so isolated), and, realistically, only if the entire Hadoop
   stack moves to java7 as well
 
 
 
 
   (eg we have to recompile HBase to
   generate v1.7 binaries even if they stick on API v1.6). I'm not aware
   of a feature, bug etc that really motivates this.
  
   I don't see that being needed unless we move up to new java7+ only
  libraries and HBase needs to track this.
 
   The big recompile to work issue is google guava, which is troublesome
  enough I'd be tempted to say can we drop it entirely
 
 
 
   An alternate approach is to keep the current stable release series
   (v2.x) as is, and start using new APIs in trunk (for v3). This will be
   a major upgrade for Hadoop and therefore an incompatible change like
   this is to be expected (it would be great if this came with additional
   changes to better isolate classpaths and dependencies from each
   other). It allows us to continue to support multiple types of users
   with different branches, vs forcing all users onto a new version. It
   of course means that 2.x users will not get the benefits of the new
   API, but its unclear what those benefits are given theIy can already
   get the benefits of adopting the newer java runtimes today.
  
  
  
  I'm (personally) +1 to this, I also think we should plan to do the switch
  some time this year to not only get the benefits, but discover the costs
 


 Agree



  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 




-- 
Alejandro


Re: [VOTE] Release Apache Hadoop 2.4.0

2014-04-10 Thread Chen He
+1(non-binding)
download source code
compile successfully
run wordcount and loadgen without problem


On Tue, Apr 8, 2014 at 11:11 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.comwrote:

 Hi Arun,

 I apologize for the late response.
 If the problems are recognized correctly, +1 for the release(non-binding).

 * Ran examples on pseudo distributed cluster.
 * Ran tests.
 * Built from source.

 Let's fix the problems at the target version(2.4.1).

 Thanks,
 - Tsuyoshi


 On Wed, Apr 9, 2014 at 4:45 AM, sanjay Radia san...@hortonworks.com
 wrote:
 
 
  +1 binding
  Verified binaries, ran from binary on single node cluster. Tested some
 HDFS clis and wordcount.
 
  sanjay
  On Apr 7, 2014, at 9:52 AM, Suresh Srinivas sur...@hortonworks.com
 wrote:
 
  +1 (binding)
 
  Verified the signatures and hashes for both src and binary tars. Built
 from
  the source, the binary distribution and the documentation. Started a
 single
  node cluster and tested the following:
  # Started HDFS cluster, verified the hdfs CLI commands such ls, copying
  data back and forth, verified namenode webUI etc.
  # Ran some tests such as sleep job, TestDFSIO, NNBench etc.
 
  I agree with Arun's anaylysis. At this time, the bar for blockers
 should be
  quite high. We can do a dot release if people want some more bug fixes.
 
 
  On Mon, Mar 31, 2014 at 2:22 AM, Arun C Murthy a...@hortonworks.com
 wrote:
 
  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.4.0 that I would
 like
  to get released.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.4.0-rc0
  The RC tag in svn is here:
  https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0-rc0
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7
 days.
 
  thanks,
  Arun
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 
 
 
 
  --
  http://hortonworks.com/download/
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.



 --
 - Tsuyoshi



[jira] [Created] (HADOOP-10489) UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to ConcurrentModificationException

2014-04-10 Thread Jing Zhao (JIRA)
Jing Zhao created HADOOP-10489:
--

 Summary: UserGroupInformation#getTokens and 
UserGroupInformation#addToken can lead to ConcurrentModificationException
 Key: HADOOP-10489
 URL: https://issues.apache.org/jira/browse/HADOOP-10489
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao


Currently UserGroupInformation#getTokens and UserGroupInformation#addToken uses 
UGI's monitor to protect the iteration and modification of 
Credentials#tokenMap. Per 
[discussion|https://issues.apache.org/jira/browse/HADOOP-10475?focusedCommentId=13965851page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13965851]
 in HADOOP-10475, this can still lead to ConcurrentModificationException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10490) TestMapFile and TestBloomMapFile leak file descriptors.

2014-04-10 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-10490:
--

 Summary: TestMapFile and TestBloomMapFile leak file descriptors.
 Key: HADOOP-10490
 URL: https://issues.apache.org/jira/browse/HADOOP-10490
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


Multiple tests in {{TestMapFile}} and {{TestBloomMapFile}} open files but don't 
close them.  On Windows, the leaked file descriptors cause subsequent tests to 
fail, because file locks are still held while trying to delete the test data 
directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)