Re: Gridmix on Yarn
Hi Guo, Can you please let me know if there was any specific configuration needed to get Gridmix working with YARN+MRv2. We are getting the following exception: INFO gridmix.JobSubmitter: Job org.apache.hadoop.mapreduce.Job@18a8ce2 submission failed java.lang.ArithmeticException: / by zero at org.apache.hadoop.mapred.gridmix.GenerateData$GenDataFormat.getSplits(GenerateData.java:161) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902) It seems to be very similar to this issue which relates to not having the tasktracker is not running https://issues.apache.org/jira/browse/MAPREDUCE-2016 Yes, I'm using Gridmix running on YARN+MRv2.
Re: Gridmix on Yarn
Hi Brian, I'm using hadoop-2.3.0-cdh5.1.0, there is gridmix jar in this package. step 1: using Rumen to generate the job trace file. sudo -u yarn java -cp `hadoop classpath` org.apache.hadoop.tools.rumen.TraceBuilder file:///tmp/jobhistory_log/job-trace.json file:///tmp/jobhistory_log/topology.output file:///tmp/jobhistory_log/15/000375/ step 2: run gridmix on yarn+mrv2 sudo -u yarn hadoop dfs -put /tmp/jobhistory_log/job-trace.json /tmp sudo -u yarn hadoop jar /usr/lib/hadoop-mapreduce/hadoop-gridmix.jar -Dgridmix.min.file.size=10485760 -Dgridmix.job-submission.use-queue-in-trace=true -Dgridmix.distributed-cache-emulation.enable=false -generate 133120m hdfs:///user/yarn/foo/ hdfs:///tmp/job-trace.json hope this can be helpful. -Leitao 2014-08-01 20:08 GMT+08:00 Brian Husted brian.hus...@gmail.com: Hi Guo, Can you please let me know if there was any specific configuration needed to get Gridmix working with YARN+MRv2. We are getting the following exception: INFO gridmix.JobSubmitter: Job org.apache.hadoop.mapreduce.Job@18a8ce2 submission failed java.lang.ArithmeticException: / by zero at org.apache.hadoop.mapred.gridmix.GenerateData$GenDataFormat.getSplits(GenerateData.java:161) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902) It seems to be very similar to this issue which relates to not having the tasktracker is not running https://issues.apache.org/jira/browse/MAPREDUCE-2016 Yes, I'm using Gridmix running on YARN+MRv2.
[jira] [Created] (MAPREDUCE-6021) MR AM should add working directory to LD_LIBRARY_PATH
Jason Lowe created MAPREDUCE-6021: - Summary: MR AM should add working directory to LD_LIBRARY_PATH Key: MAPREDUCE-6021 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6021 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.4.1 Reporter: Jason Lowe Tasks implicitly pick up shared libraries added to the job because the task launch context explicitly adds the container working directory to LD_LIBRARY_PATH. However the same is not done for the AM container which is inconsistent. User code can run in the AM via output committer, speculator, uber job, etc., so the AM's LD_LIBRARY_PATH should have the container work directory for consistency with tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Branching 2.5
Folks, I think we are very close to voting on RC0. Just wanted to check one (hopefully) last thing. I am unable to verify the signed maven artifacts are actually deployed. To deploy the artifacts, I did the following and it looked like it ran fine. 1. .m2/settings.xml - server-id is apache.staging.https 2. mvn deploy -Psign,src,dist -Dmaven.test.skip.exec=true -Dcontainer-executor.conf.dir=/etc/hadoop/conf -Dgpg.passphrase=my-passphrase However, I don't see it here - https://repository.apache.org. How do I verify this? Thanks Karthik On Wed, Jul 30, 2014 at 4:30 PM, Karthik Kambatla ka...@cloudera.com wrote: Thanks to Andrew's patch on HADOOP-10910, I am able to build an RC. On Wed, Jul 30, 2014 at 1:59 AM, Ted Yu yuzhih...@gmail.com wrote: Adding bui...@apache.org Cheers On Jul 30, 2014, at 12:52 AM, Andrew Wang andrew.w...@cloudera.com wrote: Alright, dug around some more and I think it's that FINDBUGS_HOME is not being set correctly. I downloaded and extracted Findbugs 1.3.9, pointed FINDBUGS_HOME at it, and the build worked after that. I don't know what's up with the default maven build, it'd be great if someone could check. Can someone with access to the build machines check this? As a side note, I think 1.3.9 was released in 2009. It'd be nice to catch up with the last 5 years of static analysis :) On Tue, Jul 29, 2014 at 11:36 PM, Andrew Wang andrew.w...@cloudera.com wrote: I looked in the log, it also looks like findbugs is OOMing: [java] Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded [java]at edu.umd.cs.findbugs.ba.Path.grow(Path.java:263) [java]at edu.umd.cs.findbugs.ba.Path.copyFrom(Path.java:113) [java]at edu.umd.cs.findbugs.ba.Path.duplicate(Path.java:103) [java]at edu.umd.cs.findbugs.ba.obl.State.duplicate(State.java:65) This is quite possibly related, since there's an error at the end like this: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project hadoop-hdfs: An Ant BuildException has occured: input file /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml does not exist [ERROR] around Ant part ...xslt style=/home/jenkins/tools/findbugs/latest/src/xsl/default.xsl in=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml out=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/site/findbugs.html/... @ 44:368 in /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml I'll try to figure out how to increase this, but if anyone else knows, feel free to chime in. On Tue, Jul 29, 2014 at 5:41 PM, Karthik Kambatla ka...@cloudera.com wrote: Devs, I created branch-2.5.0 and was trying to cut an RC, but ran into issues with creating one. If anyone knows what is going on, please help me out. I ll continue looking into it otherwise. https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/24/console is the build that failed. It appears the issue is because it can't find Null.java. I run into the same issue locally as well, even with branch-2.4.1. So, I wonder if I should be doing anything else to create the RC instead? Thanks Karthik On Sun, Jul 27, 2014 at 11:09 AM, Zhijie Shen zs...@hortonworks.com wrote: I've just committed YARN-2247, which is the last 2.5 blocker from YARN. On Sat, Jul 26, 2014 at 5:02 AM, Karthik Kambatla ka...@cloudera.com wrote: A quick update: All remaining blockers are on the verge of getting committed. Once that is done, I plan to cut a branch for 2.5.0 and get an RC out hopefully this coming Monday. On Fri, Jul 25, 2014 at 12:32 PM, Andrew Wang andrew.w...@cloudera.com wrote: One thing I forgot, the release note activities are happening at HADOOP-10821. If you have other things you'd like to see mentioned, feel free to leave a comment on the JIRA and I'll try to include it. Thanks, Andrew On Fri, Jul 25, 2014 at 12:28 PM, Andrew Wang andrew.w...@cloudera.com wrote: I just went through and fixed up the HDFS and Common CHANGES.txt for 2.5.0. As a friendly reminder, please try to put things under the correct section :) We have subsections for the xattr changes in HDFS-2006 and HADOOP-10514, and there were some unrelated JIRAs appended to the end. I'd also encourage committers to be more liberal with their use of the NEW FEATURES section. I'm helping Karthik write up the 2.5 release notes, and I'm using NEW FEATURES to fill it out. When looking through the
[jira] [Created] (MAPREDUCE-6022) map_input_file is missing from streaming job environment
Jason Lowe created MAPREDUCE-6022: - Summary: map_input_file is missing from streaming job environment Key: MAPREDUCE-6022 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jason Lowe When running a streaming job the 'map_input_file' environment variable is not being set. This property is deprecated, but in the past deprecated properties still appeared in a stream job's environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[VOTE] Release Apache Hadoop 2.5.0
Hi folks, I have put together a release candidate (rc0) for Hadoop 2.5.0. The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.0-RC0/ The RC tag in svn is here: https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc0/ The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1007/ You can find my public key at: http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS Please try the release and vote. The vote will run for 5 days. Thanks Karthik
Re: [VOTE] Release Apache Hadoop 2.5.0
I am obviously a +1 (non-binding). I brought a pseudo-distributed cluster and ran a few HDFS commands and MR jobs. On Fri, Aug 1, 2014 at 4:16 PM, Karthik Kambatla ka...@cloudera.com wrote: Hi folks, I have put together a release candidate (rc0) for Hadoop 2.5.0. The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.0-RC0/ The RC tag in svn is here: https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc0/ The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1007/ You can find my public key at: http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS Please try the release and vote. The vote will run for 5 days. Thanks Karthik
[DISCUSS] Migrate from svn to git for source control?
Hi folks, From what I hear, a lot of devs use the git mirror for development/reviews and use subversion primarily for checking code in. I was wondering if it would make more sense just to move to git. In addition to subjective liking of git, I see the following advantages in our workflow: 1. Feature branches - it becomes easier to work on them and keep rebasing against the latest trunk. 2. Cherry-picks between branches automatically ensures the exact same commit message and tracks the lineage as well. 3. When cutting new branches and/or updating maven versions etc., it allows doing all the work locally before pushing it to the main branch. 4. Opens us up to potentially using other code-review tools. (Gerrit?) 5. It is just more convenient. I am sure this was brought up before in different capacities. I believe the support for git in ASF is healthy now and several downstream projects have moved. Again, from what I hear, ASF INFRA folks make the migration process fairly easy. What do you all think? Thanks Karthik
Re: Branching 2.5
Tom White helped me figure it out, and closed the Nexus repository for me. Thanks Tom for helping and Stack for offering to help. On Fri, Aug 1, 2014 at 11:28 AM, Karthik Kambatla ka...@cloudera.com wrote: Folks, I think we are very close to voting on RC0. Just wanted to check one (hopefully) last thing. I am unable to verify the signed maven artifacts are actually deployed. To deploy the artifacts, I did the following and it looked like it ran fine. 1. .m2/settings.xml - server-id is apache.staging.https 2. mvn deploy -Psign,src,dist -Dmaven.test.skip.exec=true -Dcontainer-executor.conf.dir=/etc/hadoop/conf -Dgpg.passphrase=my-passphrase However, I don't see it here - https://repository.apache.org. How do I verify this? Thanks Karthik On Wed, Jul 30, 2014 at 4:30 PM, Karthik Kambatla ka...@cloudera.com wrote: Thanks to Andrew's patch on HADOOP-10910, I am able to build an RC. On Wed, Jul 30, 2014 at 1:59 AM, Ted Yu yuzhih...@gmail.com wrote: Adding bui...@apache.org Cheers On Jul 30, 2014, at 12:52 AM, Andrew Wang andrew.w...@cloudera.com wrote: Alright, dug around some more and I think it's that FINDBUGS_HOME is not being set correctly. I downloaded and extracted Findbugs 1.3.9, pointed FINDBUGS_HOME at it, and the build worked after that. I don't know what's up with the default maven build, it'd be great if someone could check. Can someone with access to the build machines check this? As a side note, I think 1.3.9 was released in 2009. It'd be nice to catch up with the last 5 years of static analysis :) On Tue, Jul 29, 2014 at 11:36 PM, Andrew Wang andrew.w...@cloudera.com wrote: I looked in the log, it also looks like findbugs is OOMing: [java] Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded [java]at edu.umd.cs.findbugs.ba.Path.grow(Path.java:263) [java]at edu.umd.cs.findbugs.ba.Path.copyFrom(Path.java:113) [java]at edu.umd.cs.findbugs.ba.Path.duplicate(Path.java:103) [java]at edu.umd.cs.findbugs.ba.obl.State.duplicate(State.java:65) This is quite possibly related, since there's an error at the end like this: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project hadoop-hdfs: An Ant BuildException has occured: input file /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml does not exist [ERROR] around Ant part ...xslt style=/home/jenkins/tools/findbugs/latest/src/xsl/default.xsl in=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml out=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/site/findbugs.html/... @ 44:368 in /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml I'll try to figure out how to increase this, but if anyone else knows, feel free to chime in. On Tue, Jul 29, 2014 at 5:41 PM, Karthik Kambatla ka...@cloudera.com wrote: Devs, I created branch-2.5.0 and was trying to cut an RC, but ran into issues with creating one. If anyone knows what is going on, please help me out. I ll continue looking into it otherwise. https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/24/console is the build that failed. It appears the issue is because it can't find Null.java. I run into the same issue locally as well, even with branch-2.4.1. So, I wonder if I should be doing anything else to create the RC instead? Thanks Karthik On Sun, Jul 27, 2014 at 11:09 AM, Zhijie Shen zs...@hortonworks.com wrote: I've just committed YARN-2247, which is the last 2.5 blocker from YARN. On Sat, Jul 26, 2014 at 5:02 AM, Karthik Kambatla ka...@cloudera.com wrote: A quick update: All remaining blockers are on the verge of getting committed. Once that is done, I plan to cut a branch for 2.5.0 and get an RC out hopefully this coming Monday. On Fri, Jul 25, 2014 at 12:32 PM, Andrew Wang andrew.w...@cloudera.com wrote: One thing I forgot, the release note activities are happening at HADOOP-10821. If you have other things you'd like to see mentioned, feel free to leave a comment on the JIRA and I'll try to include it. Thanks, Andrew On Fri, Jul 25, 2014 at 12:28 PM, Andrew Wang andrew.w...@cloudera.com wrote: I just went through and fixed up the HDFS and Common CHANGES.txt for 2.5.0. As a friendly reminder, please try to put things under the correct section :) We have subsections for the xattr changes in HDFS-2006 and HADOOP-10514, and there were some unrelated JIRAs appended to the
Re: [VOTE] Release Apache Hadoop 2.5.0
Missed Andrew's email in the other thread. Looks like we might need HDFS-6793. I ll wait to see if others find any other issues, so I can address them all together. On Fri, Aug 1, 2014 at 4:25 PM, Karthik Kambatla ka...@cloudera.com wrote: I am obviously a +1 (non-binding). I brought a pseudo-distributed cluster and ran a few HDFS commands and MR jobs. On Fri, Aug 1, 2014 at 4:16 PM, Karthik Kambatla ka...@cloudera.com wrote: Hi folks, I have put together a release candidate (rc0) for Hadoop 2.5.0. The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.0-RC0/ The RC tag in svn is here: https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc0/ The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1007/ You can find my public key at: http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS Please try the release and vote. The vote will run for 5 days. Thanks Karthik
Re: [DISCUSS] Migrate from svn to git for source control?
Thanks for starting this thread Karthik! Big +1 from me. I only use svn when I have to commit things or work on the site, otherwise it's always the git mirror or local git repos. Considering that the git mirror works as well as it does, I'd expect this to be a pretty smooth transition. Best, Andrew On Fri, Aug 1, 2014 at 4:43 PM, Karthik Kambatla ka...@cloudera.com wrote: Hi folks, From what I hear, a lot of devs use the git mirror for development/reviews and use subversion primarily for checking code in. I was wondering if it would make more sense just to move to git. In addition to subjective liking of git, I see the following advantages in our workflow: 1. Feature branches - it becomes easier to work on them and keep rebasing against the latest trunk. 2. Cherry-picks between branches automatically ensures the exact same commit message and tracks the lineage as well. 3. When cutting new branches and/or updating maven versions etc., it allows doing all the work locally before pushing it to the main branch. 4. Opens us up to potentially using other code-review tools. (Gerrit?) 5. It is just more convenient. I am sure this was brought up before in different capacities. I believe the support for git in ASF is healthy now and several downstream projects have moved. Again, from what I hear, ASF INFRA folks make the migration process fairly easy. What do you all think? Thanks Karthik
Re: [DISCUSS] Migrate from svn to git for source control?
+1, we did it for Oozie a while back and was painless with minor issues in Jenkins jobs Rebasing feature branches on latest trunk may be tricky as that may require a force push and if I'm not mistaken force pushes are disabled in Apache GIT. thx On Fri, Aug 1, 2014 at 4:43 PM, Karthik Kambatla ka...@cloudera.com wrote: Hi folks, From what I hear, a lot of devs use the git mirror for development/reviews and use subversion primarily for checking code in. I was wondering if it would make more sense just to move to git. In addition to subjective liking of git, I see the following advantages in our workflow: 1. Feature branches - it becomes easier to work on them and keep rebasing against the latest trunk. 2. Cherry-picks between branches automatically ensures the exact same commit message and tracks the lineage as well. 3. When cutting new branches and/or updating maven versions etc., it allows doing all the work locally before pushing it to the main branch. 4. Opens us up to potentially using other code-review tools. (Gerrit?) 5. It is just more convenient. I am sure this was brought up before in different capacities. I believe the support for git in ASF is healthy now and several downstream projects have moved. Again, from what I hear, ASF INFRA folks make the migration process fairly easy. What do you all think? Thanks Karthik