Build failed in Jenkins: Hadoop-Common-trunk #950
See https://builds.apache.org/job/Hadoop-Common-trunk/950/changes Changes: [cmccabe] HDFS-5485. add command-line support for modifyDirective (cmccabe) [jeagles] YARN-1386. NodeManager mistakenly loses resources and relocalizes them (Jason Lowe via jeagles) [cnauroth] HADOOP-10093. hadoop-env.cmd sets HADOOP_CLIENT_OPTS with a max heap size that is too small. Contributed by Shanyu Zhao. [wang] HDFS-5450. better API for getting the cached blocks locations. Contributed by Andrew Wang. [cmccabe] HDFS-5471. CacheAdmin -listPools fails when user lacks permissions to view all pools (Andrew Wang via Colin Patrick McCabe) [jing9] HDFS-5425. Renaming underconstruction file with snapshots can make NN failure on restart. Contributed by Vinay and Jing Zhao. [cnauroth] YARN-1400. yarn.cmd uses HADOOP_RESOURCEMANAGER_OPTS. Should be YARN_RESOURCEMANAGER_OPTS. Contributed by Raja Aluri. [sandy] MAPREDUCE-1176. FixedLengthInputFormat and FixedLengthRecordReader (Mariappan Asokan and BitsOfInfo via Sandy Ryza) [wang] Move HDFS-5495 to 2.3.0 in CHANGES.txt [wang] HDFS-5495. Remove further JUnit3 usages from HDFS. Contributed by Jarek Jarcec Cecho. [jing9] HDFS-5488. Clean up TestHftpURLTimeout. Contributed by Haohui Mai. [sandy] YARN-1387. RMWebServices should use ClientRMService for filtering applications (Karthik Kambatla via Sandy Ryza) [cnauroth] YARN-1395. Distributed shell application master launched with debug flag can hang waiting for external ls process. Contributed by Chris Nauroth. [wang] HDFS-5467. Remove tab characters in hdfs-default.xml. Contributed by Shinichi Yamashita. [jlowe] MAPREDUCE-5186. mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail. Contributed by Robert Parker and Jason Lowe [wang] HDFS-5320. Add datanode caching metrics. Contributed by Andrew Wang. -- [...truncated 57439 lines...] Adding reference: maven.local.repository [DEBUG] Initialize Maven Ant Tasks parsing buildfile jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml with URI = jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml from a zip file parsing buildfile jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml with URI = jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml from a zip file Class org.apache.maven.ant.tasks.AttachArtifactTask loaded from parent loader (parentFirst) +Datatype attachartifact org.apache.maven.ant.tasks.AttachArtifactTask Class org.apache.maven.ant.tasks.DependencyFilesetsTask loaded from parent loader (parentFirst) +Datatype dependencyfilesets org.apache.maven.ant.tasks.DependencyFilesetsTask Setting project property: test.build.dir - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/target/test-dir Setting project property: test.exclude.pattern - _ Setting project property: hadoop.assemblies.version - 3.0.0-SNAPSHOT Setting project property: test.exclude - _ Setting project property: distMgmtSnapshotsId - apache.snapshots.https Setting project property: project.build.sourceEncoding - UTF-8 Setting project property: java.security.egd - file:///dev/urandom Setting project property: distMgmtSnapshotsUrl - https://repository.apache.org/content/repositories/snapshots Setting project property: distMgmtStagingUrl - https://repository.apache.org/service/local/staging/deploy/maven2 Setting project property: avro.version - 1.7.4 Setting project property: test.build.data - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/target/test-dir Setting project property: commons-daemon.version - 1.0.13 Setting project property: hadoop.common.build.dir - https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/../../hadoop-common-project/hadoop-common/target Setting project property: testsThreadCount - 4 Setting project property: maven.test.redirectTestOutputToFile - true Setting project property: jdiff.version - 1.0.9 Setting project property: distMgmtStagingName - Apache Release Distribution Repository Setting project property: project.reporting.outputEncoding - UTF-8 Setting project property: build.platform - Linux-i386-32 Setting project property: protobuf.version - 2.5.0 Setting project property: failIfNoTests - false Setting project property: protoc.path - ${env.HADOOP_PROTOC_PATH} Setting project property: jersey.version - 1.9 Setting project property: distMgmtStagingId - apache.staging.https Setting project property: distMgmtSnapshotsName - Apache Development Snapshot Repository Setting project property: ant.file -
[jira] [Created] (HADOOP-10095) Performance improvement in CodecPool
Nicolas Liochon created HADOOP-10095: Summary: Performance improvement in CodecPool Key: HADOOP-10095 URL: https://issues.apache.org/jira/browse/HADOOP-10095 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.2.0, 3.0.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 3.0.0, 2.2.1 CodecPool shows up when profiling HBase with a mixed workload (it says we spend 1% of the time there). It could be a profiler side effect, but on the other hand we can save some 'Map#contains'. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Next releases
On 13 November 2013 18:11, Arun C Murthy a...@hortonworks.com wrote: HADOOP-9623 Update jets3t dependency to 0.9.0 I saw that change -I don't think its a bad one, but I do think we need more testing of blobstores especially big operations, like 6Gb uploads (which should now work with the 0.9.0 jets3t). -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: scope of jersey-test-framework-grizzly2
looks accidental, file a patch link it to HADOOP-9991 On 13 November 2013 05:09, Ted Yu yuzhih...@gmail.com wrote: Hi, To answer some question on dev@hbase, I noticed the following dependencies: [INFO] +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile [INFO] | | +- com.google.inject:guice:jar:3.0:compile [INFO] | | | +- javax.inject:javax.inject:jar:1:compile [INFO] | | | \- aopalliance:aopalliance:jar:1.0:compile [INFO] | | +- com.sun.jersey.jersey-test-framework: *jersey-test-framework-grizzly2*:jar:1.9:compile [INFO] | | | +- com.sun.jersey.jersey-test-framework:jersey-test-framework-core:jar:1.9:compile Should jersey-test-framework-grizzly2 have test scope ? I tried the following change: Index: hadoop-yarn-project/hadoop-yarn/pom.xml === --- hadoop-yarn-project/hadoop-yarn/pom.xml (revision 1541341) +++ hadoop-yarn-project/hadoop-yarn/pom.xml (working copy) @@ -119,6 +119,7 @@ dependency groupIdcom.sun.jersey.jersey-test-framework/groupId artifactIdjersey-test-framework-grizzly2/artifactId + scopetest/scope /dependency dependency groupIdcom.sun.jersey/groupId The above change led to addition of jersey-test-framework-grizzly2 in hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml I want to get some opinion on whether jersey-test-framework-grizzly2 should have test scope. Thanks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Hadoop source on windows7
Hi, I have taken the latest code from trunk and getting following error while trying to build Hadoop HDFS on windows 7. Can anybody help to resolve this issue. [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve depende ncies for project org.apache.hadoop:hadoop-hdfs:jar:3.0.0-SNAPSHOT: Could not fi nd artifact org.apache.hadoop:hadoop-common:jar:tests:3.0.0-SNAPSHOT in apache.s napshots.https (https://repository.apache.org/content/repositories/snapshots) - [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main SUCCESS [1.296s] [INFO] Apache Hadoop Project POM . SUCCESS [0.574s] [INFO] Apache Hadoop Annotations . SUCCESS [2.378s] [INFO] Apache Hadoop Project Dist POM SUCCESS [0.627s] [INFO] Apache Hadoop Assemblies .. SUCCESS [0.419s] [INFO] Apache Hadoop Maven Plugins ... SUCCESS [2.628s] [INFO] Apache Hadoop MiniKDC . SUCCESS [0.870s] [INFO] Apache Hadoop Auth SUCCESS [1.321s] [INFO] Apache Hadoop Auth Examples ... SUCCESS [1.162s] [INFO] Apache Hadoop Common .. SUCCESS [2:06.412s] [INFO] Apache Hadoop NFS . SUCCESS [1:09.401s] [INFO] Apache Hadoop Common Project .. SUCCESS [0.104s] [INFO] Apache Hadoop HDFS FAILURE [15.636s] [INFO] Apache Hadoop HttpFS .. SKIPPED [INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED [INFO] Apache Hadoop HDFS-NFS SKIPPED [INFO] Apache Hadoop HDFS Project SKIPPED [INFO] hadoop-yarn ... SKIPPED [INFO] hadoop-yarn-api ... SKIPPED [INFO] hadoop-yarn-common SKIPPED [INFO] hadoop-yarn-server SKIPPED [INFO] hadoop-yarn-server-common . SKIPPED [INFO] hadoop-yarn-server-nodemanager SKIPPED [INFO] hadoop-yarn-server-web-proxy .. SKIPPED [INFO] hadoop-yarn-server-resourcemanager SKIPPED [INFO] hadoop-yarn-server-tests .. SKIPPED [INFO] hadoop-yarn-client SKIPPED [INFO] hadoop-yarn-applications .. SKIPPED [INFO] hadoop-yarn-applications-distributedshell . SKIPPED [INFO] hadoop-mapreduce-client ... SKIPPED [INFO] hadoop-mapreduce-client-core .. SKIPPED [INFO] hadoop-yarn-applications-unmanaged-am-launcher SKIPPED [INFO] hadoop-yarn-site .. SKIPPED [INFO] hadoop-yarn-project ... SKIPPED [INFO] hadoop-mapreduce-client-common SKIPPED [INFO] hadoop-mapreduce-client-shuffle ... SKIPPED [INFO] hadoop-mapreduce-client-app ... SKIPPED [INFO] hadoop-mapreduce-client-hs SKIPPED [INFO] hadoop-mapreduce-client-jobclient . SKIPPED [INFO] hadoop-mapreduce-client-hs-plugins SKIPPED [INFO] Apache Hadoop MapReduce Examples .. SKIPPED [INFO] hadoop-mapreduce .. SKIPPED [INFO] Apache Hadoop MapReduce Streaming . SKIPPED [INFO] Apache Hadoop Distributed Copy SKIPPED [INFO] Apache Hadoop Archives SKIPPED [INFO] Apache Hadoop Rumen ... SKIPPED [INFO] Apache Hadoop Gridmix . SKIPPED [INFO] Apache Hadoop Data Join ... SKIPPED [INFO] Apache Hadoop Extras .. SKIPPED [INFO] Apache Hadoop Pipes ... SKIPPED [INFO] Apache Hadoop OpenStack support ... SKIPPED [INFO] Apache Hadoop Client .. SKIPPED [INFO] Apache Hadoop Mini-Cluster SKIPPED [INFO] Apache Hadoop Scheduler Load Simulator SKIPPED [INFO] Apache Hadoop Tools Dist .. SKIPPED [INFO] Apache Hadoop Tools ... SKIPPED [INFO] Apache Hadoop Distribution SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 3:44.699s [INFO] Finished at: Wed Nov 13 23:05:01 IST 2013 [INFO] Final Memory: 47M/422M [INFO] [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve depend ncies
Re: Next releases
Here are few patches that I put into 2.2.1 and are minimally invasive, but I don't think are blockers: YARN-305. Fair scheduler logs too many Node offered to app messages. YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication YARN-1333. Support blacklisting in the Fair Scheduler YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) YARN-1388. Fair Scheduler page always displays blank fair share +1 to doing releases at some fixed time interval. -Sandy On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: scope of jersey-test-framework-grizzly2
I attached a patch to HADOOP-9991 where scope of jersey-test-framework is changed to test. Looking at the bottom of the JIRA, there are several more tasks to be done. Cheers On Wed, Nov 13, 2013 at 10:30 AM, Steve Loughran ste...@hortonworks.comwrote: looks accidental, file a patch link it to HADOOP-9991 On 13 November 2013 05:09, Ted Yu yuzhih...@gmail.com wrote: Hi, To answer some question on dev@hbase, I noticed the following dependencies: [INFO] +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile [INFO] | | +- com.google.inject:guice:jar:3.0:compile [INFO] | | | +- javax.inject:javax.inject:jar:1:compile [INFO] | | | \- aopalliance:aopalliance:jar:1.0:compile [INFO] | | +- com.sun.jersey.jersey-test-framework: *jersey-test-framework-grizzly2*:jar:1.9:compile [INFO] | | | +- com.sun.jersey.jersey-test-framework:jersey-test-framework-core:jar:1.9:compile Should jersey-test-framework-grizzly2 have test scope ? I tried the following change: Index: hadoop-yarn-project/hadoop-yarn/pom.xml === --- hadoop-yarn-project/hadoop-yarn/pom.xml (revision 1541341) +++ hadoop-yarn-project/hadoop-yarn/pom.xml (working copy) @@ -119,6 +119,7 @@ dependency groupIdcom.sun.jersey.jersey-test-framework/groupId artifactIdjersey-test-framework-grizzly2/artifactId + scopetest/scope /dependency dependency groupIdcom.sun.jersey/groupId The above change led to addition of jersey-test-framework-grizzly2 in hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml I want to get some opinion on whether jersey-test-framework-grizzly2 should have test scope. Thanks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: How to make call to an external program in Hadoop?
Hi, We can use third party built in classes from NLP, Text Mining libraries, and others in java Map Reduce or We can use Python plus Hadoop streaming for writing more parallel complex code. This link has code for computing Pearson correlation: https://github.com/malli3131/HadoopTutorial/tree/master/Mapreduce/Programs/Pearson Thanks On Sat, Nov 9, 2013 at 12:40 AM, Tony Wang ivyt...@gmail.com wrote: So far, I only know that Hadoop can do counting. I am wondering if there's any way to make calls to an external program for more complex processing than counting in hadoop. Is there any example? thanks tony -- Thanks and Regards Nagamallikarjuna
[jira] [Created] (HADOOP-10096) Missing dependency on commons-collections
Robert Rati created HADOOP-10096: Summary: Missing dependency on commons-collections Key: HADOOP-10096 URL: https://issues.apache.org/jira/browse/HADOOP-10096 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Robert Rati Priority: Minor Attachments: HADOOP-10096.patch There's a missing dependency on commons-collections -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10097) Extend JenkinsHash package interface to allow increased code sharing
Eric Hanson created HADOOP-10097: Summary: Extend JenkinsHash package interface to allow increased code sharing Key: HADOOP-10097 URL: https://issues.apache.org/jira/browse/HADOOP-10097 Project: Hadoop Common Issue Type: Improvement Reporter: Eric Hanson Assignee: Eric Hanson Priority: Minor I copied some code from org.apache.hadoop.util.hash.JenkinsHash and added it to org.apache.hadoop.hive.ql.exec.vector.expressions.CuckooSetBytes and modified it slightly because the interface was not quite right for use CuckooSetBytes. I propose modifying org.apache.hadoop.util.hash.JenkinsHash to provide an additional interface function: public int hash(byte[] key, int start, int nbytes, int initval) This would return a hash value for the sequence of bytes beginning at start and ending at start + nbytes (exclusive). The existing interface function in org.apache.hadoop.util.hash.JenkinsHash public int hash(byte[] key, int nbytes, int initval) would then be modified to call this new function. The original hash() function does not take a start parameter, and always assumes the key in byte[] key starts at position 0. This will expand the use cases for the JenkinsHash package. At that point, the Hive CuckooSetBytes class can be modified so that it can reference the JenkinsHash package of Hadoop and use it directly, rather than using a copied and modified version of the code locally. Existing users of hash(byte[] key, int nbytes, int initval) will then have to pay an extra function call. If the performance ramifications of this worry anyone, please comment. Alternatives would be to copy the new version of hash() in entirety into JenkinsHash, or simply not do this JIRA. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HADOOP-10096) Missing dependency on commons-collections
[ https://issues.apache.org/jira/browse/HADOOP-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Rati resolved HADOOP-10096. -- Resolution: Duplicate Missing dependency on commons-collections - Key: HADOOP-10096 URL: https://issues.apache.org/jira/browse/HADOOP-10096 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.2.0 Reporter: Robert Rati Priority: Minor Attachments: HADOOP-10096.patch There's a missing dependency on commons-collections -- This message was sent by Atlassian JIRA (v6.1#6144)
svn playing up?
I've been doing some trivial pom patches, along with the one line HDFS-5075 patch branch-2 $ svn status M hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt M hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/libexec/httpfs-config.sh branch-2 $ svn commit -m HDFS-5075 httpfs-config.sh calls out incorrect env script name Sendinghadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Sending hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/libexec/httpfs-config.sh Transmitting file data .. Committed revision 1541687. then I go to pull it up into trunk -an action which worked a few minutes earlier for POM patch MAPREDUCE-5431 hadoop-trunk $ svn merge -r 1541686:1541687 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2 . svn: E160006: No such revision 1541687 branch-2 $ svn update The commit I just put up isn't there So I go back to the branch-2 directory branch-2 $ svn update Updating '.': svn: E160006: No such reported revision '1541687' found in the repository. Perhaps the repository is out of date with respect to the master repository? branch-2 $ svn info Path: . Working Copy Root Path: /Users/stevel/Hadoop/svn/branch-2 URL: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2 Repository Root: https://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1541675 Node Kind: directory Schedule: normal Last Changed Author: jing9 Last Changed Rev: 1541671 Last Changed Date: 2013-11-13 19:34:08 + (Wed, 13 Nov 2013) branch-2 $ Anyway: after about 5 minutes everything resynchronized again, but it looks like svn is exposing its failover mechanisms -steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Hadoop in Fedora updated to 2.2.0
I've just been through some of these as part of my background project, fix up the POMs https://issues.apache.org/jira/browse/HADOOP-9991. 1. I've applied the simple low/risk ones. 2. I've not done the bookkeeper one, as people working with that code need to play with it first. 3. I've not touched anything related to {jersey, tomcat, jetty} This is more than just a java6/7 issue, is is that Jetty has been very brittle in the past, and there's code in hadoop to detect when it's not actually servicing requests properly. Moving up Jetty/web server versions is something that needs to be done carefull and with consensus -and once you leave Jetty alone, I don't know where the jersey and tomcat changes go. There is always the option of s/jetty/r/grizzly/ -steve On 1 November 2013 14:57, Robert Rati rr...@redhat.com wrote: Putting the java 6 vs java 7 issue aside, what about the other patches to update dependencies? Can those be looked at an planned for inclusion into a releation? Rob On 10/31/2013 05:51 PM, Andrew Wang wrote: I'm in agreement with Steve on this one. We're aware that Java 6 is EOL, but we can't drop support for the lifetime of the 2.x line since it's a (very) incompatible change. AFAIK a 3.x release fixing this isn't on any of our horizons yet. Best, Andrew On Thu, Oct 31, 2013 at 6:15 AM, Robert Rati rr...@redhat.com wrote: https://issues.apache.org/jira/browse/HADOOP-9594https: //issues.apache.org/**jira/browse/HADOOP-9594 https:**//issues.apache.org/jira/**browse/HADOOP-9594http s://issues.apache.org/jira/browse/HADOOP-9594 https://issues.apache.org/jira/browse/MAPREDUCE-5431htt ps://issues.apache.org/**jira/browse/MAPREDUCE-5431 htt**ps://issues.apache.org/jira/**browse/MAPREDUCE-5431h ttps://issues.apache.org/jira/browse/MAPREDUCE-5431 https://issues.apache.org/jira/browse/HADOOP-9611https: //issues.apache.org/**jira/browse/HADOOP-9611 https:**//issues.apache.org/jira/**browse/HADOOP-9611http s://issues.apache.org/jira/browse/HADOOP-9611 https://issues.apache.org/jira/browse/HADOOP-9613https: //issues.apache.org/**jira/browse/HADOOP-9613 https:**//issues.apache.org/jira/**browse/HADOOP-9613http s://issues.apache.org/jira/browse/HADOOP-9613 https://issues.apache.org/jira/browse/HADOOP-9623https: //issues.apache.org/**jira/browse/HADOOP-9623 https:**//issues.apache.org/jira/**browse/HADOOP-9623http s://issues.apache.org/jira/browse/HADOOP-9623 https://issues.apache.org/jira/browse/HDFS-5411https:// issues.apache.org/**jira/browse/HDFS-5411 https://**issues.apache.org/jira/browse/**HDFS-5411https: //issues.apache.org/jira/browse/HDFS-5411 https://issues.apache.org/jira/browse/HADOOP-10067https ://issues.apache.org/**jira/browse/HADOOP-10067 https**://issues.apache.org/jira/**browse/HADOOP-10067htt ps://issues.apache.org/jira/browse/HADOOP-10067 https://issues.apache.org/jira/browse/HDFS-5075https:// issues.apache.org/**jira/browse/HDFS-5075 https://**issues.apache.org/jira/browse/**HDFS-5075https: //issues.apache.org/jira/browse/HDFS-5075 https://issues.apache.org/jira/browse/HADOOP-10068https ://issues.apache.org/**jira/browse/HADOOP-10068 https**://issues.apache.org/jira/**browse/HADOOP-10068htt ps://issues.apache.org/jira/browse/HADOOP-10068 https://issues.apache.org/jira/browse/HADOOP-10075https ://issues.apache.org/**jira/browse/HADOOP-10075 https**://issues.apache.org/jira/**browse/HADOOP-10075htt ps://issues.apache.org/jira/browse/HADOOP-10075 https://issues.apache.org/jira/browse/HADOOP-10076https ://issues.apache.org/**jira/browse/HADOOP-10076 https**://issues.apache.org/jira/**browse/HADOOP-10076htt ps://issues.apache.org/jira/browse/HADOOP-10076 https://issues.apache.org/jira/browse/HADOOP-9849https: //issues.apache.org/**jira/browse/HADOOP-9849 https:**//issues.apache.org/jira/**browse/HADOOP-9849http s://issues.apache.org/jira/browse/HADOOP-9849 most (all?) of these are pom changes A good number are basically pom changes to update to newer versions of dependencies. A few, such as commons-math3, required code changes as well because of a namespace change. Some are minor code changes to enhance compatibility with newer dependencies. Even tomcat is mostly changes in pom files. Most of the changes are minor. There are 2 big updates though: Jetty 9 (which requires java 7) and tomcat 7. These are also the most difficult patches to rebase when hadoop produces a new release. that's not going to go in the 2.x branch. Java 6 is still a common platform that people are using, because historically java7 (or any leading edge java version) is buggy. that said, our QA team did test hadoop 2 HDP-2 at scale on java7 and openjdk 7, so it all works -it's just the commit java7 only is a big decision that I realize moving to java 7 is a big decision and wasn't trying to imply this
[jira] [Created] (HADOOP-10098) move grizzly-test and junit dependencies to test scope
Steve Loughran created HADOOP-10098: --- Summary: move grizzly-test and junit dependencies to test scope Key: HADOOP-10098 URL: https://issues.apache.org/jira/browse/HADOOP-10098 Project: Hadoop Common Issue Type: Sub-task Reporter: Steve Loughran stop the the grizzly dependences Junit getting into everything downstream by moving them to test scope -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Question on hadoop dependencies.
Petar, We've just been through some of 2.3, on the branch https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2 Most of that was update, apart from moving grizzly and junit to test scope, we've not done much. why do you have a look and help clean things out. I'd particularly like to see lean clients for HDFS, YARN and mapreduce On 30 October 2013 22:25, Petar Tahchiev paranoia...@gmail.com wrote: Hi Roman, looks like they have already upgraded to 2.2 https://issues.apache.org/jira/browse/SOLR-5382 and will be shipping it SOLR 4.6. I just hope you guys release cleaned 2.3 first :) 2013/10/30 Roman Shaposhnik r...@apache.org On Wed, Oct 30, 2013 at 1:07 PM, Steve Loughran ste...@hortonworks.com wrote: On 30 October 2013 13:07, Petar Tahchiev paranoia...@gmail.com wrote: So spring-data-solr (1.1.SNAPSHOT) uses solr 4.5.1 (just came out a few days ago), which uses Hadoop 2.0.5-alpha. I would be glad if we can clean up the poms a bit and leave only the dependencies that hadoop really depend on. To pile on top of what Steve has said -- do you happen to know if there's a JIRA to re target Solr to depend on Hadoop 2.2.0? Thanks, Roman. -- Regards, Petar! Karlovo, Bulgaria. --- Public PGP Key at: https://keyserver1.pgp.com/vkd/DownloadKey.event?keyid=0x19658550C3110611 Key Fingerprint: A369 A7EE 61BC 93A3 CDFF 55A5 1965 8550 C311 0611 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
On Nov 13, 2013, at 12:38 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Here are few patches that I put into 2.2.1 and are minimally invasive, but I don't think are blockers: YARN-305. Fair scheduler logs too many Node offered to app messages. YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication YARN-1333. Support blacklisting in the Fair Scheduler YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) YARN-1388. Fair Scheduler page always displays blank fair share +1 to doing releases at some fixed time interval. To be clear, I still think we should be *very* clear about what features we target for each release (2.3, 2.4, etc.). Except, we don't wait infinitely for any specific feature - if we miss a 4-6 week window a feature goes to the next train. Makes sense? thanks, Arun -Sandy On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Sound goods, just a little impedance between what seem to be 2 conflicting goals: * what features we target for each release * train releases If we want to do train releases at fixed times, then if a feature is not ready, it catches the next train, no delays of the train because of a feature. If a bug is delaying the train and a feature becomes ready in the mean time and it does not stabilize the release, it can jump on board, if it breaks something, it goes out of the window until the next train. Also we have do decide what we do with 2.2.1. I would say start wrapping up the current 2.2 branch and make it the first train. thx On Wed, Nov 13, 2013 at 12:55 PM, Arun C Murthy a...@hortonworks.com wrote: On Nov 13, 2013, at 12:38 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Here are few patches that I put into 2.2.1 and are minimally invasive, but I don't think are blockers: YARN-305. Fair scheduler logs too many Node offered to app messages. YARN-1335. Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication YARN-1333. Support blacklisting in the Fair Scheduler YARN-1109. Demote NodeManager Sending out status for container logs to debug (haosdent via Sandy Ryza) YARN-1388. Fair Scheduler page always displays blank fair share +1 to doing releases at some fixed time interval. To be clear, I still think we should be *very* clear about what features we target for each release (2.3, 2.4, etc.). Except, we don't wait infinitely for any specific feature - if we miss a 4-6 week window a feature goes to the next train. Makes sense? thanks, Arun -Sandy On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Re: Question on hadoop dependencies.
Hi Steve, I'll definitely take a look, although I'm not sure when exactly :( Currently busy with the Maven AppAssembler. 2013/11/13 Steve Loughran ste...@hortonworks.com Petar, We've just been through some of 2.3, on the branch https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2 Most of that was update, apart from moving grizzly and junit to test scope, we've not done much. why do you have a look and help clean things out. I'd particularly like to see lean clients for HDFS, YARN and mapreduce On 30 October 2013 22:25, Petar Tahchiev paranoia...@gmail.com wrote: Hi Roman, looks like they have already upgraded to 2.2 https://issues.apache.org/jira/browse/SOLR-5382 and will be shipping it SOLR 4.6. I just hope you guys release cleaned 2.3 first :) 2013/10/30 Roman Shaposhnik r...@apache.org On Wed, Oct 30, 2013 at 1:07 PM, Steve Loughran ste...@hortonworks.com wrote: On 30 October 2013 13:07, Petar Tahchiev paranoia...@gmail.com wrote: So spring-data-solr (1.1.SNAPSHOT) uses solr 4.5.1 (just came out a few days ago), which uses Hadoop 2.0.5-alpha. I would be glad if we can clean up the poms a bit and leave only the dependencies that hadoop really depend on. To pile on top of what Steve has said -- do you happen to know if there's a JIRA to re target Solr to depend on Hadoop 2.2.0? Thanks, Roman. -- Regards, Petar! Karlovo, Bulgaria. --- Public PGP Key at: https://keyserver1.pgp.com/vkd/DownloadKey.event?keyid=0x19658550C3110611 Key Fingerprint: A369 A7EE 61BC 93A3 CDFF 55A5 1965 8550 C311 0611 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Regards, Petar! Karlovo, Bulgaria. --- Public PGP Key at: https://keyserver1.pgp.com/vkd/DownloadKey.event?keyid=0x19658550C3110611 Key Fingerprint: A369 A7EE 61BC 93A3 CDFF 55A5 1965 8550 C311 0611
[jira] [Created] (HADOOP-10099) Reduce chance for RPC denial of service
Daryn Sharp created HADOOP-10099: Summary: Reduce chance for RPC denial of service Key: HADOOP-10099 URL: https://issues.apache.org/jira/browse/HADOOP-10099 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Minor A RPC server may accept an unlimited number of connections unless indirectly bounded by a blocking operation in the RPC handler threads. The NN's namespace locking happens to cause this blocking, but other RPC servers such as yarn's generate async events which allow unbridled connection acceptance. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Next releases
I think a lot of confusion comes from the fact that the 2.x line is starting to mature. Before this there wasn't such a big contention of what went into patch vs. minor releases and often the lines were blurred between the two. However now we have significant customers and products starting to use 2.x as a base, which means we need to start treating it like we treat 1.x. That means getting serious about what we should put into a patch release vs. what we postpone to a minor release. Here's my $0.02 on recent proposals: +1 to releasing more often in general. A lot of the rush to put changes into a patch release is because it can be a very long time between any kind of release. If minor releases are more frequent then I hope there would be less of a need to rush something or hold up a release. +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. As for 2.2.1, again I think it's expectations for what that release means. If it's really just a patch release then there shouldn't be features in it and tons of code churn, but I think many were treating it as the next vehicle to deliver changes in general. If we think 2.2.1 is just as good or better than 2.2.0 then let's wrap it up and move to a more disciplined approach for subsequent patch releases and more frequent minor releases. Jason On 11/13/2013 12:10 PM, Arun C Murthy wrote: On Nov 12, 2013, at 1:54 PM, Todd Lipcon t...@cloudera.com wrote: On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of things so I may not have the full picture. Arun, can you give a specific example of something you'd like to blow away? There are bunch of issues in YARN/MapReduce which clearly aren't *critical*, similarly in HDFS a cursory glance showed up some *enhancements*/*improvements* in CHANGES.txt which aren't necessary for a patch release, plus things like: HADOOP-9623 Update jets3t dependency to 0.9.0 Having said that, the HDFS devs know their code the best. I agree with Colin. If we've been backporting things into a patch release (third version component) which don't belong, we should explicitly call out those patches, so we can learn from our mistakes and have a discussion about what belongs. Good point. Here is a straw man proposal: A patch (third version) release should only include *blocker* bugs which are critical from an operational, security or data-integrity issues. This way, we can ensure that a minor series release (2.2.x or 2.3.x or 2.4.x) is always release-able, and more importantly, deploy-able at any point in time. Sandy did bring up a related point about timing of releases and the urge for everyone to cram features/fixes into a dot release. So, we could remedy that situation by doing a release every 4-6 weeks (2.3, 2.4 etc.) and keep the patch releases limited to blocker bugs. Thoughts? thanks, Arun
[jira] [Created] (HADOOP-10100) MiniKDC shouldn't use apacheds-all artifact
Robert Kanter created HADOOP-10100: -- Summary: MiniKDC shouldn't use apacheds-all artifact Key: HADOOP-10100 URL: https://issues.apache.org/jira/browse/HADOOP-10100 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.3.0 Reporter: Robert Kanter Assignee: Robert Kanter The MiniKDC currently depends on the {{apacheds-all}} artifact: {code:xml} dependency groupIdorg.apache.directory.server/groupId artifactIdapacheds-all/artifactId version2.0.0-M15/version scopecompile/scope /dependency {code} However, this artifact includes, inside of itself, a lot of other packages, including antlr, ehcache, apache commons, and mina (you can see a full list of the packages in the jar [here|http://mvnrepository.com/artifact/org.apache.directory.server/apacheds-all/2.0.0-M15]). This can be problematic if other projects (e.g. Oozie) try to use MiniKDC and have a different version of one of those dependencies (in my case, ehcache). Because the packages are included inside the {{apacheds-all}} jar, we can't override their version. Instead, we should remove {{apacheds-all}} and use dependencies that only include org.apache.directory.* packages; the other necessary dependencies should be included normally. -- This message was sent by Atlassian JIRA (v6.1#6144)