Re: Gridmix on Yarn

2014-08-01 Thread Brian Husted
Hi Guo,

Can you please let me know if there was any specific configuration needed
to get Gridmix working with YARN+MRv2.   We are getting the following
exception:

INFO gridmix.JobSubmitter: Job org.apache.hadoop.mapreduce.Job@18a8ce2
submission failed java.lang.ArithmeticException: / by zero
at
org.apache.hadoop.mapred.gridmix.GenerateData$GenDataFormat.getSplits(GenerateData.java:161)
at
org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902)

It seems to be very similar to this issue which relates to not having the
tasktracker is not running

https://issues.apache.org/jira/browse/MAPREDUCE-2016






Yes, I'm using Gridmix running on YARN+MRv2.


Re: Gridmix on Yarn

2014-08-01 Thread Guo Leitao
Hi Brian, I'm using hadoop-2.3.0-cdh5.1.0, there is gridmix jar in this
package.

step 1: using Rumen to generate the job trace file.

sudo -u yarn java -cp `hadoop classpath`
org.apache.hadoop.tools.rumen.TraceBuilder
file:///tmp/jobhistory_log/job-trace.json
file:///tmp/jobhistory_log/topology.output
file:///tmp/jobhistory_log/15/000375/

step 2: run gridmix on yarn+mrv2

sudo -u yarn hadoop dfs -put /tmp/jobhistory_log/job-trace.json /tmp
sudo -u yarn hadoop jar /usr/lib/hadoop-mapreduce/hadoop-gridmix.jar
-Dgridmix.min.file.size=10485760
-Dgridmix.job-submission.use-queue-in-trace=true
-Dgridmix.distributed-cache-emulation.enable=false  -generate 133120m
hdfs:///user/yarn/foo/ hdfs:///tmp/job-trace.json

hope this can be helpful.

-Leitao

2014-08-01 20:08 GMT+08:00 Brian Husted brian.hus...@gmail.com:

 Hi Guo,

 Can you please let me know if there was any specific configuration needed
 to get Gridmix working with YARN+MRv2.   We are getting the following
 exception:

 INFO gridmix.JobSubmitter: Job org.apache.hadoop.mapreduce.Job@18a8ce2
 submission failed java.lang.ArithmeticException: / by zero
 at

 org.apache.hadoop.mapred.gridmix.GenerateData$GenDataFormat.getSplits(GenerateData.java:161)
 at
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902)

 It seems to be very similar to this issue which relates to not having the
 tasktracker is not running

 https://issues.apache.org/jira/browse/MAPREDUCE-2016





 
 Yes, I'm using Gridmix running on YARN+MRv2.



[jira] [Created] (MAPREDUCE-6021) MR AM should add working directory to LD_LIBRARY_PATH

2014-08-01 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-6021:
-

 Summary: MR AM should add working directory to LD_LIBRARY_PATH
 Key: MAPREDUCE-6021
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6021
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.4.1
Reporter: Jason Lowe


Tasks implicitly pick up shared libraries added to the job because the task 
launch context explicitly adds the container working directory to 
LD_LIBRARY_PATH.  However the same is not done for the AM container which is 
inconsistent.  User code can run in the AM via output committer, speculator, 
uber job, etc., so the AM's LD_LIBRARY_PATH should have the container work 
directory for consistency with tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Branching 2.5

2014-08-01 Thread Karthik Kambatla
Folks,

I think we are very close to voting on RC0. Just wanted to check one
(hopefully) last thing.

I am unable to verify the signed maven artifacts are actually deployed. To
deploy the artifacts, I did the following and it looked like it ran fine.

   1. .m2/settings.xml - server-id is apache.staging.https
   2. mvn deploy -Psign,src,dist -Dmaven.test.skip.exec=true
   -Dcontainer-executor.conf.dir=/etc/hadoop/conf
   -Dgpg.passphrase=my-passphrase

However, I don't see it here - https://repository.apache.org. How do I
verify this?

Thanks
Karthik

On Wed, Jul 30, 2014 at 4:30 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 Thanks to Andrew's patch on HADOOP-10910, I am able to build an RC.


 On Wed, Jul 30, 2014 at 1:59 AM, Ted Yu yuzhih...@gmail.com wrote:

 Adding bui...@apache.org

 Cheers

 On Jul 30, 2014, at 12:52 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Alright, dug around some more and I think it's that FINDBUGS_HOME is not
  being set correctly. I downloaded and extracted Findbugs 1.3.9, pointed
  FINDBUGS_HOME at it, and the build worked after that. I don't know
 what's
  up with the default maven build, it'd be great if someone could check.
 
  Can someone with access to the build machines check this?
 
  As a side note, I think 1.3.9 was released in 2009. It'd be nice to
 catch
  up with the last 5 years of static analysis :)
 
 
  On Tue, Jul 29, 2014 at 11:36 PM, Andrew Wang andrew.w...@cloudera.com
 
  wrote:
 
  I looked in the log, it also looks like findbugs is OOMing:
 
  [java] Exception in thread main java.lang.OutOfMemoryError: GC
 overhead limit exceeded
  [java]at edu.umd.cs.findbugs.ba.Path.grow(Path.java:263)
  [java]at edu.umd.cs.findbugs.ba.Path.copyFrom(Path.java:113)
  [java]at edu.umd.cs.findbugs.ba.Path.duplicate(Path.java:103)
  [java]at
 edu.umd.cs.findbugs.ba.obl.State.duplicate(State.java:65)
 
 
  This is quite possibly related, since there's an error at the end like
  this:
 
  [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project
 hadoop-hdfs: An Ant BuildException has occured: input file
 /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml
 does not exist
 
  [ERROR] around Ant part ...xslt
  style=/home/jenkins/tools/findbugs/latest/src/xsl/default.xsl
 
 in=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml
 
 out=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/site/findbugs.html/...
  @ 44:368 in
 
 /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
 
  I'll try to figure out how to increase this, but if anyone else knows,
  feel free to chime in.
 
 
  On Tue, Jul 29, 2014 at 5:41 PM, Karthik Kambatla ka...@cloudera.com
  wrote:
 
  Devs,
 
  I created branch-2.5.0 and was trying to cut an RC, but ran into
 issues
  with creating one. If anyone knows what is going on, please help me
 out. I
  ll continue looking into it otherwise.
 
 
 https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/24/console
  is the build that failed. It appears the issue is because it can't
 find
  Null.java. I run into the same issue locally as well, even with
  branch-2.4.1. So, I wonder if I should be doing anything else to
 create
  the
  RC instead?
 
  Thanks
  Karthik
 
 
  On Sun, Jul 27, 2014 at 11:09 AM, Zhijie Shen zs...@hortonworks.com
  wrote:
 
  I've just committed YARN-2247, which is the last 2.5 blocker from
 YARN.
 
 
  On Sat, Jul 26, 2014 at 5:02 AM, Karthik Kambatla 
 ka...@cloudera.com
  wrote:
 
  A quick update:
 
  All remaining blockers are on the verge of getting committed. Once
  that
  is
  done, I plan to cut a branch for 2.5.0 and get an RC out hopefully
  this
  coming Monday.
 
 
  On Fri, Jul 25, 2014 at 12:32 PM, Andrew Wang 
  andrew.w...@cloudera.com
  wrote:
 
  One thing I forgot, the release note activities are happening at
  HADOOP-10821. If you have other things you'd like to see mentioned,
  feel
  free to leave a comment on the JIRA and I'll try to include it.
 
  Thanks,
  Andrew
 
 
  On Fri, Jul 25, 2014 at 12:28 PM, Andrew Wang 
  andrew.w...@cloudera.com
  wrote:
 
  I just went through and fixed up the HDFS and Common CHANGES.txt
  for
  2.5.0.
 
  As a friendly reminder, please try to put things under the correct
  section
  :) We have subsections for the xattr changes in HDFS-2006 and
  HADOOP-10514,
  and there were some unrelated JIRAs appended to the end.
 
  I'd also encourage committers to be more liberal with their use of
  the
  NEW
  FEATURES section. I'm helping Karthik write up the 2.5 release
  notes,
  and
  I'm using NEW FEATURES to fill it out. When looking through the
 

[jira] [Created] (MAPREDUCE-6022) map_input_file is missing from streaming job environment

2014-08-01 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-6022:
-

 Summary: map_input_file is missing from streaming job environment
 Key: MAPREDUCE-6022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Jason Lowe


When running a streaming job the 'map_input_file' environment variable is not 
being set.  This property is deprecated, but in the past deprecated properties 
still appeared in a stream job's environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[VOTE] Release Apache Hadoop 2.5.0

2014-08-01 Thread Karthik Kambatla
Hi folks,

I have put together a release candidate (rc0) for Hadoop 2.5.0.

The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.0-RC0/
The RC tag in svn is here:
https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc0/
The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1007/

You can find my public key at:
http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

Please try the release and vote. The vote will run for 5 days.

Thanks
Karthik


Re: [VOTE] Release Apache Hadoop 2.5.0

2014-08-01 Thread Karthik Kambatla
I am obviously a +1 (non-binding).

I brought a pseudo-distributed cluster and ran a few HDFS commands and MR
jobs.


On Fri, Aug 1, 2014 at 4:16 PM, Karthik Kambatla ka...@cloudera.com wrote:

 Hi folks,

 I have put together a release candidate (rc0) for Hadoop 2.5.0.

 The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.0-RC0/
 The RC tag in svn is here:
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc0/
 The maven artifacts are staged at
 https://repository.apache.org/content/repositories/orgapachehadoop-1007/

 You can find my public key at:
 http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

 Please try the release and vote. The vote will run for 5 days.

 Thanks
 Karthik



[DISCUSS] Migrate from svn to git for source control?

2014-08-01 Thread Karthik Kambatla
Hi folks,

From what I hear, a lot of devs use the git mirror for development/reviews
and use subversion primarily for checking code in. I was wondering if it
would make more sense just to move to git. In addition to subjective liking
of git, I see the following advantages in our workflow:

   1. Feature branches - it becomes easier to work on them and keep
   rebasing against the latest trunk.
   2. Cherry-picks between branches automatically ensures the exact same
   commit message and tracks the lineage as well.
   3. When cutting new branches and/or updating maven versions etc., it
   allows doing all the work locally before pushing it to the main branch.
   4. Opens us up to potentially using other code-review tools. (Gerrit?)
   5. It is just more convenient.

I am sure this was brought up before in different capacities. I believe the
support for git in ASF is healthy now and several downstream projects have
moved. Again, from what I hear, ASF INFRA folks make the migration process
fairly easy.

What do you all think?

Thanks
Karthik


Re: Branching 2.5

2014-08-01 Thread Karthik Kambatla
Tom White helped me figure it out, and closed the Nexus repository for me.

Thanks Tom for helping and Stack for offering to help.


On Fri, Aug 1, 2014 at 11:28 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 Folks,

 I think we are very close to voting on RC0. Just wanted to check one
 (hopefully) last thing.

 I am unable to verify the signed maven artifacts are actually deployed. To
 deploy the artifacts, I did the following and it looked like it ran fine.

1. .m2/settings.xml - server-id is apache.staging.https
2. mvn deploy -Psign,src,dist -Dmaven.test.skip.exec=true
-Dcontainer-executor.conf.dir=/etc/hadoop/conf
-Dgpg.passphrase=my-passphrase

 However, I don't see it here - https://repository.apache.org. How do I
 verify this?

 Thanks
 Karthik

 On Wed, Jul 30, 2014 at 4:30 PM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Thanks to Andrew's patch on HADOOP-10910, I am able to build an RC.


 On Wed, Jul 30, 2014 at 1:59 AM, Ted Yu yuzhih...@gmail.com wrote:

 Adding bui...@apache.org

 Cheers

 On Jul 30, 2014, at 12:52 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Alright, dug around some more and I think it's that FINDBUGS_HOME is
 not
  being set correctly. I downloaded and extracted Findbugs 1.3.9, pointed
  FINDBUGS_HOME at it, and the build worked after that. I don't know
 what's
  up with the default maven build, it'd be great if someone could check.
 
  Can someone with access to the build machines check this?
 
  As a side note, I think 1.3.9 was released in 2009. It'd be nice to
 catch
  up with the last 5 years of static analysis :)
 
 
  On Tue, Jul 29, 2014 at 11:36 PM, Andrew Wang 
 andrew.w...@cloudera.com
  wrote:
 
  I looked in the log, it also looks like findbugs is OOMing:
 
  [java] Exception in thread main java.lang.OutOfMemoryError: GC
 overhead limit exceeded
  [java]at edu.umd.cs.findbugs.ba.Path.grow(Path.java:263)
  [java]at edu.umd.cs.findbugs.ba.Path.copyFrom(Path.java:113)
  [java]at edu.umd.cs.findbugs.ba.Path.duplicate(Path.java:103)
  [java]at
 edu.umd.cs.findbugs.ba.obl.State.duplicate(State.java:65)
 
 
  This is quite possibly related, since there's an error at the end like
  this:
 
  [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-antrun-plugin:1.7:run (site) on project
 hadoop-hdfs: An Ant BuildException has occured: input file
 /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml
 does not exist
 
  [ERROR] around Ant part ...xslt
  style=/home/jenkins/tools/findbugs/latest/src/xsl/default.xsl
 
 in=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/findbugsXml.xml
 
 out=/home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/site/findbugs.html/...
  @ 44:368 in
 
 /home/jenkins/jenkins-slave/workspace/HADOOP2_Release_Artifacts_Builder/branch-2.5.0/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
 
  I'll try to figure out how to increase this, but if anyone else knows,
  feel free to chime in.
 
 
  On Tue, Jul 29, 2014 at 5:41 PM, Karthik Kambatla ka...@cloudera.com
 
  wrote:
 
  Devs,
 
  I created branch-2.5.0 and was trying to cut an RC, but ran into
 issues
  with creating one. If anyone knows what is going on, please help me
 out. I
  ll continue looking into it otherwise.
 
 
 https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/24/console
  is the build that failed. It appears the issue is because it can't
 find
  Null.java. I run into the same issue locally as well, even with
  branch-2.4.1. So, I wonder if I should be doing anything else to
 create
  the
  RC instead?
 
  Thanks
  Karthik
 
 
  On Sun, Jul 27, 2014 at 11:09 AM, Zhijie Shen zs...@hortonworks.com
 
  wrote:
 
  I've just committed YARN-2247, which is the last 2.5 blocker from
 YARN.
 
 
  On Sat, Jul 26, 2014 at 5:02 AM, Karthik Kambatla 
 ka...@cloudera.com
  wrote:
 
  A quick update:
 
  All remaining blockers are on the verge of getting committed. Once
  that
  is
  done, I plan to cut a branch for 2.5.0 and get an RC out hopefully
  this
  coming Monday.
 
 
  On Fri, Jul 25, 2014 at 12:32 PM, Andrew Wang 
  andrew.w...@cloudera.com
  wrote:
 
  One thing I forgot, the release note activities are happening at
  HADOOP-10821. If you have other things you'd like to see
 mentioned,
  feel
  free to leave a comment on the JIRA and I'll try to include it.
 
  Thanks,
  Andrew
 
 
  On Fri, Jul 25, 2014 at 12:28 PM, Andrew Wang 
  andrew.w...@cloudera.com
  wrote:
 
  I just went through and fixed up the HDFS and Common CHANGES.txt
  for
  2.5.0.
 
  As a friendly reminder, please try to put things under the
 correct
  section
  :) We have subsections for the xattr changes in HDFS-2006 and
  HADOOP-10514,
  and there were some unrelated JIRAs appended to the 

Re: [VOTE] Release Apache Hadoop 2.5.0

2014-08-01 Thread Karthik Kambatla
Missed Andrew's email in the other thread. Looks like we might need
HDFS-6793.

I ll wait to see if others find any other issues, so I can address them all
together.


On Fri, Aug 1, 2014 at 4:25 PM, Karthik Kambatla ka...@cloudera.com wrote:

 I am obviously a +1 (non-binding).

 I brought a pseudo-distributed cluster and ran a few HDFS commands and MR
 jobs.


 On Fri, Aug 1, 2014 at 4:16 PM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Hi folks,

 I have put together a release candidate (rc0) for Hadoop 2.5.0.

 The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.0-RC0/
 The RC tag in svn is here:
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc0/
 The maven artifacts are staged at
 https://repository.apache.org/content/repositories/orgapachehadoop-1007/

 You can find my public key at:
 http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

 Please try the release and vote. The vote will run for 5 days.

 Thanks
 Karthik





Re: [DISCUSS] Migrate from svn to git for source control?

2014-08-01 Thread Andrew Wang
Thanks for starting this thread Karthik! Big +1 from me. I only use svn
when I have to commit things or work on the site, otherwise it's always the
git mirror or local git repos.

Considering that the git mirror works as well as it does, I'd expect this
to be a pretty smooth transition.

Best,
Andrew


On Fri, Aug 1, 2014 at 4:43 PM, Karthik Kambatla ka...@cloudera.com wrote:

 Hi folks,

 From what I hear, a lot of devs use the git mirror for development/reviews
 and use subversion primarily for checking code in. I was wondering if it
 would make more sense just to move to git. In addition to subjective liking
 of git, I see the following advantages in our workflow:

1. Feature branches - it becomes easier to work on them and keep
rebasing against the latest trunk.
2. Cherry-picks between branches automatically ensures the exact same
commit message and tracks the lineage as well.
3. When cutting new branches and/or updating maven versions etc., it
allows doing all the work locally before pushing it to the main branch.
4. Opens us up to potentially using other code-review tools. (Gerrit?)
5. It is just more convenient.

 I am sure this was brought up before in different capacities. I believe the
 support for git in ASF is healthy now and several downstream projects have
 moved. Again, from what I hear, ASF INFRA folks make the migration process
 fairly easy.

 What do you all think?

 Thanks
 Karthik



Re: [DISCUSS] Migrate from svn to git for source control?

2014-08-01 Thread Alejandro Abdelnur
+1, we did it for Oozie a while back and was painless with minor issues in
Jenkins jobs

Rebasing feature branches on latest trunk may be tricky as that may require
a force push and if I'm not mistaken force pushes are disabled in Apache
GIT.

thx


On Fri, Aug 1, 2014 at 4:43 PM, Karthik Kambatla ka...@cloudera.com wrote:

 Hi folks,

 From what I hear, a lot of devs use the git mirror for development/reviews
 and use subversion primarily for checking code in. I was wondering if it
 would make more sense just to move to git. In addition to subjective liking
 of git, I see the following advantages in our workflow:

1. Feature branches - it becomes easier to work on them and keep
rebasing against the latest trunk.
2. Cherry-picks between branches automatically ensures the exact same
commit message and tracks the lineage as well.
3. When cutting new branches and/or updating maven versions etc., it
allows doing all the work locally before pushing it to the main branch.
4. Opens us up to potentially using other code-review tools. (Gerrit?)
5. It is just more convenient.

 I am sure this was brought up before in different capacities. I believe the
 support for git in ASF is healthy now and several downstream projects have
 moved. Again, from what I hear, ASF INFRA folks make the migration process
 fairly easy.

 What do you all think?

 Thanks
 Karthik