Re: Plans of moving towards JDK7 in trunk
On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover the costs -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
DISCUSS: use SLF4J APIs in new modules?
If we're thinking of future progress, here's a little low-level one: adopt SLF4J as the API for logging 1. its the new defacto standard of logging APIs 2. its a lot better than commons-logging with on demand Inline string expansion of varags arguments. 3. we already ship it, as jetty uses it 4. we already depend on it, client-side and server-side in the hadoop-auth package 5. it lets people log via logback if they want to. That's client-side, even if the server stays on log4j 6. It's way faster than using String.format() The best initial thing about SL4FJ is how it only expands its arguments string values if needed LOG.debug(Initialized, principal [{}] from keytab [{}], principal, keytab); not logging at debug? No need to test first. That alone saves code and improves readability. The slf4 expansion code handles null values as well as calling toString() on non-null arguments. Oh and it does arrays too. int array = [1, 2, 3]; String undef = null; LOG.info(a = {}, u = {}, array, undef) - a = [1, 2, 3], u = null Switching to SLF4J from commons-logging is as trivial as changing the type of the logger created, but with one logger per class that does get expensive in terms of change. Moving to SLF4J across the board would be a big piece of work -but doable. Rather than push for a dramatic change why not adopt a policy of demanding it in new maven subprojects? hadoop-auth shows we permit it, so why not say you MUST? Once people have experience in using it, and are happy, then we could think about switching to the new APIs in the core modules. The only troublespot there is where code calls getLogger() on the commons log to get at the log4j appender -there's ~3 places in production code that does this, 200+ in tests -tests that do it to turn back log levels. Those tests can stay with commons-logging, same for the production uses. Mixing commons-logging and slf4j isn't drastic -they both route to log4j or a.n.other back end. -Stevve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Build failed in Jenkins: Hadoop-Common-trunk #1095
See https://builds.apache.org/job/Hadoop-Common-trunk/1095/changes Changes: [tucu] HADOOP-10428. JavaKeyStoreProvider should accept keystore password via configuration falling back to ENV VAR. (tucu) [vinodkv] YARN-1910. Fixed a race condition in TestAMRMTokens that causes the test to fail more often on Windows. Contributed by Xuan Gong. [wheat9] HDFS-6225. Remove the o.a.h.hdfs.server.common.UpgradeStatusReport. Contributed by Haohui Mai. [cnauroth] HDFS-6208. DataNode caching can leak file descriptors. Contributed by Chris Nauroth. [wheat9] HDFS-6170. Support GETFILESTATUS operation in WebImageViewer. Contributed by Akira Ajisaka. [tucu] HADOOP-10429. KeyStores should have methods to generate the materials themselves, KeyShell should use them. (tucu) [tucu] HADOOP-10427. KeyProvider implementations should be thread safe. (tucu) [tucu] HADOOP-10432. Refactor SSLFactory to expose static method to determine HostnameVerifier. (tucu) [szetszwo] HDFS-6209. TestValidateConfigurationSettings should use random ports. Contributed by Arpit Agarwal [wheat9] HADOOP-10485. Remove dead classes in hadoop-streaming. Contributed by Haohui Mai. [szetszwo] HDFS-6204. Fix TestRBWBlockInvalidation: change the last sleep to a loop. [szetszwo] HDFS-6206. Fix NullPointerException in DFSUtil.substituteForWildcardAddress. [szetszwo] HADOOP-10473. TestCallQueueManager should interrupt before counting calls. [jeagles] HDFS-6215. Wrong error message for upgrade. (Kihwal Lee via jeagles) [arp] HDFS-6160. TestSafeMode occasionally fails. (Contributed by Arpit Agarwal) [kihwal] YARN-1907. TestRMApplicationHistoryWriter#testRMWritingMassiveHistory intermittently fails. Contributed by Mit Desai. [stevel] HADOOP-10104. Update jackson to 1.9.13 (Akira Ajisaka via stevel) -- [...truncated 60536 lines...] Adding reference: maven.local.repository [DEBUG] Initialize Maven Ant Tasks parsing buildfile jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml with URI = jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml from a zip file parsing buildfile jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml with URI = jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml from a zip file Class org.apache.maven.ant.tasks.AttachArtifactTask loaded from parent loader (parentFirst) +Datatype attachartifact org.apache.maven.ant.tasks.AttachArtifactTask Class org.apache.maven.ant.tasks.DependencyFilesetsTask loaded from parent loader (parentFirst) +Datatype dependencyfilesets org.apache.maven.ant.tasks.DependencyFilesetsTask Setting project property: test.build.dir - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/target/test-dir Setting project property: test.exclude.pattern - _ Setting project property: hadoop.assemblies.version - 3.0.0-SNAPSHOT Setting project property: test.exclude - _ Setting project property: distMgmtSnapshotsId - apache.snapshots.https Setting project property: project.build.sourceEncoding - UTF-8 Setting project property: java.security.egd - file:///dev/urandom Setting project property: distMgmtSnapshotsUrl - https://repository.apache.org/content/repositories/snapshots Setting project property: distMgmtStagingUrl - https://repository.apache.org/service/local/staging/deploy/maven2 Setting project property: avro.version - 1.7.4 Setting project property: test.build.data - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/target/test-dir Setting project property: commons-daemon.version - 1.0.13 Setting project property: hadoop.common.build.dir - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/../../hadoop-common-project/hadoop-common/target Setting project property: testsThreadCount - 4 Setting project property: maven.test.redirectTestOutputToFile - true Setting project property: jdiff.version - 1.0.9 Setting project property: build.platform - Linux-i386-32 Setting project property: project.reporting.outputEncoding - UTF-8 Setting project property: distMgmtStagingName - Apache Release Distribution Repository Setting project property: protobuf.version - 2.5.0 Setting project property: failIfNoTests - false Setting project property: protoc.path - ${env.HADOOP_PROTOC_PATH} Setting project property: jersey.version - 1.9 Setting project property: distMgmtStagingId - apache.staging.https Setting project property: distMgmtSnapshotsName - Apache Development Snapshot Repository Setting project property: ant.file - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml [DEBUG] Setting properties with prefix: Setting
Re: DISCUSS: use SLF4J APIs in new modules?
Slf4j is definetly a great step forward. Log4j is restrictive for complex and multi tenant apps like hadoop. Also the fact that slf4j doesn't use any magic when binding to its log provider makes it way easier to swap out its implementation then tools of the past. On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com wrote: If we're thinking of future progress, here's a little low-level one: adopt SLF4J as the API for logging 1. its the new defacto standard of logging APIs 2. its a lot better than commons-logging with on demand Inline string expansion of varags arguments. 3. we already ship it, as jetty uses it 4. we already depend on it, client-side and server-side in the hadoop-auth package 5. it lets people log via logback if they want to. That's client-side, even if the server stays on log4j 6. It's way faster than using String.format() The best initial thing about SL4FJ is how it only expands its arguments string values if needed LOG.debug(Initialized, principal [{}] from keytab [{}], principal, keytab); not logging at debug? No need to test first. That alone saves code and improves readability. The slf4 expansion code handles null values as well as calling toString() on non-null arguments. Oh and it does arrays too. int array = [1, 2, 3]; String undef = null; LOG.info(a = {}, u = {}, array, undef) - a = [1, 2, 3], u = null Switching to SLF4J from commons-logging is as trivial as changing the type of the logger created, but with one logger per class that does get expensive in terms of change. Moving to SLF4J across the board would be a big piece of work -but doable. Rather than push for a dramatic change why not adopt a policy of demanding it in new maven subprojects? hadoop-auth shows we permit it, so why not say you MUST? Once people have experience in using it, and are happy, then we could think about switching to the new APIs in the core modules. The only troublespot there is where code calls getLogger() on the commons log to get at the log4j appender -there's ~3 places in production code that does this, 200+ in tests -tests that do it to turn back log levels. Those tests can stay with commons-logging, same for the production uses. Mixing commons-logging and slf4j isn't drastic -they both route to log4j or a.n.other back end. -Stevve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Plans of moving towards JDK7 in trunk
I think the problem to be solved here is to define a point in time when the average Hadoop contributor can start using Java7 dependencies in their code. The use Java7 dependencies in trunk(/branch3) plan, by itself, does not solve this problem. The average Hadoop contributor wants to see their contributions make it into a stable release in a predictable amount of time. Putting code with a Java7 dependency into trunk means the exact opposite: there is no timeline to a stable release. So most contributors will stay away from Java7 dependencies, despite the nominal policy that they're allowed in trunk. (And the few that do use Java7 dependencies are people who do not value releasing code into stable releases, which arguably could lead to a situation that the Java7-dependent code in trunk is, on average, on the buggy side.) I'm not saying the branch2-in-the-future plan is the only way to solve the problem of putting Java7 dependencies on a known time-table, but at least it solves it. Is there another solution? On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote: On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover the costs -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Resolved] (HADOOP-10382) Add Apache Tez to the Hadoop homepage as a related project
[ https://issues.apache.org/jira/browse/HADOOP-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy resolved HADOOP-10382. Resolution: Fixed I just committed this. Add Apache Tez to the Hadoop homepage as a related project -- Key: HADOOP-10382 URL: https://issues.apache.org/jira/browse/HADOOP-10382 Project: Hadoop Common Issue Type: Bug Components: documentation Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: HADOOP-10382.patch, HADOOP-10382.patch Add Apache Tez to the Hadoop homepage as a related project -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: DISCUSS: use SLF4J APIs in new modules?
+1 from me, it'd be lovely to get rid of all those isDebugEnabled checks. On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas jayunit...@gmail.com wrote: Slf4j is definetly a great step forward. Log4j is restrictive for complex and multi tenant apps like hadoop. Also the fact that slf4j doesn't use any magic when binding to its log provider makes it way easier to swap out its implementation then tools of the past. On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com wrote: If we're thinking of future progress, here's a little low-level one: adopt SLF4J as the API for logging 1. its the new defacto standard of logging APIs 2. its a lot better than commons-logging with on demand Inline string expansion of varags arguments. 3. we already ship it, as jetty uses it 4. we already depend on it, client-side and server-side in the hadoop-auth package 5. it lets people log via logback if they want to. That's client-side, even if the server stays on log4j 6. It's way faster than using String.format() The best initial thing about SL4FJ is how it only expands its arguments string values if needed LOG.debug(Initialized, principal [{}] from keytab [{}], principal, keytab); not logging at debug? No need to test first. That alone saves code and improves readability. The slf4 expansion code handles null values as well as calling toString() on non-null arguments. Oh and it does arrays too. int array = [1, 2, 3]; String undef = null; LOG.info(a = {}, u = {}, array, undef) - a = [1, 2, 3], u = null Switching to SLF4J from commons-logging is as trivial as changing the type of the logger created, but with one logger per class that does get expensive in terms of change. Moving to SLF4J across the board would be a big piece of work -but doable. Rather than push for a dramatic change why not adopt a policy of demanding it in new maven subprojects? hadoop-auth shows we permit it, so why not say you MUST? Once people have experience in using it, and are happy, then we could think about switching to the new APIs in the core modules. The only troublespot there is where code calls getLogger() on the commons log to get at the log4j appender -there's ~3 places in production code that does this, 200+ in tests -tests that do it to turn back log levels. Those tests can stay with commons-logging, same for the production uses. Mixing commons-logging and slf4j isn't drastic -they both route to log4j or a.n.other back end. -Stevve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: DISCUSS: use SLF4J APIs in new modules?
+1 pn slf4j. one thing Jay, the issues with log4j will still be there as log4j will still be under the hood. thx Alejandro (phone typing) On Apr 10, 2014, at 7:35, Andrew Wang andrew.w...@cloudera.com wrote: +1 from me, it'd be lovely to get rid of all those isDebugEnabled checks. On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas jayunit...@gmail.com wrote: Slf4j is definetly a great step forward. Log4j is restrictive for complex and multi tenant apps like hadoop. Also the fact that slf4j doesn't use any magic when binding to its log provider makes it way easier to swap out its implementation then tools of the past. On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com wrote: If we're thinking of future progress, here's a little low-level one: adopt SLF4J as the API for logging 1. its the new defacto standard of logging APIs 2. its a lot better than commons-logging with on demand Inline string expansion of varags arguments. 3. we already ship it, as jetty uses it 4. we already depend on it, client-side and server-side in the hadoop-auth package 5. it lets people log via logback if they want to. That's client-side, even if the server stays on log4j 6. It's way faster than using String.format() The best initial thing about SL4FJ is how it only expands its arguments string values if needed LOG.debug(Initialized, principal [{}] from keytab [{}], principal, keytab); not logging at debug? No need to test first. That alone saves code and improves readability. The slf4 expansion code handles null values as well as calling toString() on non-null arguments. Oh and it does arrays too. int array = [1, 2, 3]; String undef = null; LOG.info(a = {}, u = {}, array, undef) - a = [1, 2, 3], u = null Switching to SLF4J from commons-logging is as trivial as changing the type of the logger created, but with one logger per class that does get expensive in terms of change. Moving to SLF4J across the board would be a big piece of work -but doable. Rather than push for a dramatic change why not adopt a policy of demanding it in new maven subprojects? hadoop-auth shows we permit it, so why not say you MUST? Once people have experience in using it, and are happy, then we could think about switching to the new APIs in the core modules. The only troublespot there is where code calls getLogger() on the commons log to get at the log4j appender -there's ~3 places in production code that does this, 200+ in tests -tests that do it to turn back log levels. Those tests can stay with commons-logging, same for the production uses. Mixing commons-logging and slf4j isn't drastic -they both route to log4j or a.n.other back end. -Stevve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: DISCUSS: use SLF4J APIs in new modules?
+1 to use slf4j. I would actually vote for (1) new modules must-use, (2) new classes in existing modules are strongly recommended to use, (3) existing classes can switch to. That would take us closer to using slf4j everywhere faster. On Thu, Apr 10, 2014 at 8:17 AM, Alejandro Abdelnur t...@cloudera.comwrote: +1 pn slf4j. one thing Jay, the issues with log4j will still be there as log4j will still be under the hood. thx Alejandro (phone typing) On Apr 10, 2014, at 7:35, Andrew Wang andrew.w...@cloudera.com wrote: +1 from me, it'd be lovely to get rid of all those isDebugEnabled checks. On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas jayunit...@gmail.com wrote: Slf4j is definetly a great step forward. Log4j is restrictive for complex and multi tenant apps like hadoop. Also the fact that slf4j doesn't use any magic when binding to its log provider makes it way easier to swap out its implementation then tools of the past. On Apr 10, 2014, at 2:16 AM, Steve Loughran ste...@hortonworks.com wrote: If we're thinking of future progress, here's a little low-level one: adopt SLF4J as the API for logging 1. its the new defacto standard of logging APIs 2. its a lot better than commons-logging with on demand Inline string expansion of varags arguments. 3. we already ship it, as jetty uses it 4. we already depend on it, client-side and server-side in the hadoop-auth package 5. it lets people log via logback if they want to. That's client-side, even if the server stays on log4j 6. It's way faster than using String.format() The best initial thing about SL4FJ is how it only expands its arguments string values if needed LOG.debug(Initialized, principal [{}] from keytab [{}], principal, keytab); not logging at debug? No need to test first. That alone saves code and improves readability. The slf4 expansion code handles null values as well as calling toString() on non-null arguments. Oh and it does arrays too. int array = [1, 2, 3]; String undef = null; LOG.info(a = {}, u = {}, array, undef) - a = [1, 2, 3], u = null Switching to SLF4J from commons-logging is as trivial as changing the type of the logger created, but with one logger per class that does get expensive in terms of change. Moving to SLF4J across the board would be a big piece of work -but doable. Rather than push for a dramatic change why not adopt a policy of demanding it in new maven subprojects? hadoop-auth shows we permit it, so why not say you MUST? Once people have experience in using it, and are happy, then we could think about switching to the new APIs in the core modules. The only troublespot there is where code calls getLogger() on the commons log to get at the log4j appender -there's ~3 places in production code that does this, 200+ in tests -tests that do it to turn back log levels. Those tests can stay with commons-logging, same for the production uses. Mixing commons-logging and slf4j isn't drastic -they both route to log4j or a.n.other back end. -Stevve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Plans of moving towards JDK7 in trunk
On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.comwrote: On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? I mean 2.x -- 2.(x+1). Ie I'm running the 2.4 stable and upgrading to 2.5. That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover the costs Agree -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Plans of moving towards JDK7 in trunk
On Thu, Apr 10, 2014 at 6:49 AM, Raymie Stata rst...@altiscale.com wrote: I think the problem to be solved here is to define a point in time when the average Hadoop contributor can start using Java7 dependencies in their code. The use Java7 dependencies in trunk(/branch3) plan, by itself, does not solve this problem. The average Hadoop contributor wants to see their contributions make it into a stable release in a predictable amount of time. Putting code with a Java7 dependency into trunk means the exact opposite: there is no timeline to a stable release. So most contributors will stay away from Java7 dependencies, despite the nominal policy that they're allowed in trunk. (And the few that do use Java7 dependencies are people who do not value releasing code into stable releases, which arguably could lead to a situation that the Java7-dependent code in trunk is, on average, on the buggy side.) I'm not saying the branch2-in-the-future plan is the only way to solve the problem of putting Java7 dependencies on a known time-table, but at least it solves it. Is there another solution? All good reasons for why we should start thinking about a plan for v3. The points above pertain to any features for trunk that break compatibility, not just ones that use new Java APIs. We shouldn't permit incompatible changes to merge to v2 just because we don't yet have a timeline for v3, we should figure out the latter. Also motivates finishing the work to isolate dependencies between Hadoop code, other framework code, and user code. Let's speak less abstractly, are there particular features or new dependencies that you would like to contribute (or see contributed) that require using the Java 1.7 APIs? Breaking compat in v2 or rolling a v3 release are both non-trivial, not something I suspect we'd want to do just because it would be, for example, nicer to have a newer version of Jetty. Thanks, Eli On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote: On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover the
Re: Plans of moving towards JDK7 in trunk
A bit of a different angle. As the bottom of the stack Hadoop has to be conservative in adopting things, but it should not preclude consumers of Hadoop (downstream projects and Hadoop application developers) to have additional requirements such as a higher JDK API than JDK6. Hadoop 2.x should stick to using JDK6 API Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and eventually JDK8 Downstream projects and Hadoop application developers are free to require any JDK6+ version for development and runtime. Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime requirement to JDK7 and be tested with JDK7 and JDK8 runtimes. Thanks. On Thu, Apr 10, 2014 at 10:04 AM, Eli Collins e...@cloudera.com wrote: On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote: On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote: For the sake of this discussion we should separate the runtime from the programming APIs. Users are already migrating to the java7 runtime for most of the reasons listed below (support, performance, bugs, etc), and the various distributions cert their Hadoop 2 based distributions on java7. This gives users many of the benefits of java7, without forcing users off java6. Ie Hadoop does not need to switch to the java7 programming APIs to make sure everyone has a supported runtime. +1: you can use Java 7 today; I'm not sure how tested Java 8 is The question here is really about when Hadoop, and the Hadoop ecosystem (since adjacent projects often end up in the same classpath) start using the java7 programming APIs and therefore break compatibility with java6 runtimes. I think our java6 runtime users would consider dropping support for their java runtime in an update of a major release to be an incompatible change (the binaries stop working on their current jvm). do you mean major 2.x - 3.y or minor 2.x - 2.(x+1) here? I mean 2.x -- 2.(x+1). Ie I'm running the 2.4 stable and upgrading to 2.5. That may be worth it if we can articulate sufficient value to offset the cost (they have to upgrade their environment, might make rolling upgrades stop working, etc), but I've not yet heard an argument that articulates the value relative to the cost. Eg upgrading to the java7 APIs allows us to pull in dependencies with new major versions, but only if those dependencies don't break compatibility (which is likely given that our classpaths aren't so isolated), and, realistically, only if the entire Hadoop stack moves to java7 as well (eg we have to recompile HBase to generate v1.7 binaries even if they stick on API v1.6). I'm not aware of a feature, bug etc that really motivates this. I don't see that being needed unless we move up to new java7+ only libraries and HBase needs to track this. The big recompile to work issue is google guava, which is troublesome enough I'd be tempted to say can we drop it entirely An alternate approach is to keep the current stable release series (v2.x) as is, and start using new APIs in trunk (for v3). This will be a major upgrade for Hadoop and therefore an incompatible change like this is to be expected (it would be great if this came with additional changes to better isolate classpaths and dependencies from each other). It allows us to continue to support multiple types of users with different branches, vs forcing all users onto a new version. It of course means that 2.x users will not get the benefits of the new API, but its unclear what those benefits are given theIy can already get the benefits of adopting the newer java runtimes today. I'm (personally) +1 to this, I also think we should plan to do the switch some time this year to not only get the benefits, but discover the costs Agree -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Re: [VOTE] Release Apache Hadoop 2.4.0
+1(non-binding) download source code compile successfully run wordcount and loadgen without problem On Tue, Apr 8, 2014 at 11:11 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.comwrote: Hi Arun, I apologize for the late response. If the problems are recognized correctly, +1 for the release(non-binding). * Ran examples on pseudo distributed cluster. * Ran tests. * Built from source. Let's fix the problems at the target version(2.4.1). Thanks, - Tsuyoshi On Wed, Apr 9, 2014 at 4:45 AM, sanjay Radia san...@hortonworks.com wrote: +1 binding Verified binaries, ran from binary on single node cluster. Tested some HDFS clis and wordcount. sanjay On Apr 7, 2014, at 9:52 AM, Suresh Srinivas sur...@hortonworks.com wrote: +1 (binding) Verified the signatures and hashes for both src and binary tars. Built from the source, the binary distribution and the documentation. Started a single node cluster and tested the following: # Started HDFS cluster, verified the hdfs CLI commands such ls, copying data back and forth, verified namenode webUI etc. # Ran some tests such as sleep job, TestDFSIO, NNBench etc. I agree with Arun's anaylysis. At this time, the bar for blockers should be quite high. We can do a dot release if people want some more bug fixes. On Mon, Mar 31, 2014 at 2:22 AM, Arun C Murthy a...@hortonworks.com wrote: Folks, I've created a release candidate (rc0) for hadoop-2.4.0 that I would like to get released. The RC is available at: http://people.apache.org/~acmurthy/hadoop-2.4.0-rc0 The RC tag in svn is here: https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0-rc0 The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- - Tsuyoshi
[jira] [Created] (HADOOP-10489) UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to ConcurrentModificationException
Jing Zhao created HADOOP-10489: -- Summary: UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to ConcurrentModificationException Key: HADOOP-10489 URL: https://issues.apache.org/jira/browse/HADOOP-10489 Project: Hadoop Common Issue Type: Bug Reporter: Jing Zhao Currently UserGroupInformation#getTokens and UserGroupInformation#addToken uses UGI's monitor to protect the iteration and modification of Credentials#tokenMap. Per [discussion|https://issues.apache.org/jira/browse/HADOOP-10475?focusedCommentId=13965851page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13965851] in HADOOP-10475, this can still lead to ConcurrentModificationException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10490) TestMapFile and TestBloomMapFile leak file descriptors.
Chris Nauroth created HADOOP-10490: -- Summary: TestMapFile and TestBloomMapFile leak file descriptors. Key: HADOOP-10490 URL: https://issues.apache.org/jira/browse/HADOOP-10490 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.4.0, 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Multiple tests in {{TestMapFile}} and {{TestBloomMapFile}} open files but don't close them. On Windows, the leaked file descriptors cause subsequent tests to fail, because file locks are still held while trying to delete the test data directory. -- This message was sent by Atlassian JIRA (v6.2#6252)