date:20150318

[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated MAPREDUCE-5528:
--
Status: Patch Available  (was: Open)

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0, 2.4.1, 2.5.0, 0.20.2, 3.0.0
Reporter: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366979#comment-14366979
 ] 

Hadoop QA commented on MAPREDUCE-5528:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604700/MAPREDUCE-5528.patch
  against trunk revision 3411732.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1158 javac 
compiler warnings (more than the trunk's current 1155 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-examples.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5306//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5306//artifact/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5306//console

This message is automatically generated.

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After

[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Ewan Higgs (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated MAPREDUCE-5528:
--
Affects Version/s: 0.20.2

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Ewan Higgs (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewan Higgs updated MAPREDUCE-5528:
--
Affects Version/s: 2.5.0
   2.4.1
   2.6.0

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5811) Job history subdirectories under done directory is not set with proper permissions accessible to users

2015-03-18 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366922#comment-14366922
 ] 

Harsh J commented on MAPREDUCE-5811:


[~mayank_bansal] / [~cgupta]
What problems do the current permissions pose, as I've not seen any practical 
reports. The protection of the directories appears intentional, and we also 
provide a REST API for users to query completed job history - does that not 
suffice for your users?

 Job history subdirectories under done directory is not set with proper 
 permissions accessible to users
 --

 Key: MAPREDUCE-5811
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5811
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: chaitali gupta
Assignee: chaitali gupta
Priority: Trivial
 Attachments: MAPREDUCE-5811-2.patch, MAPREDUCE-5811.patch


 This is regarding the Hadoop log file access. The new directories under 
 /mapred/history/done/2014/mm/dd/ are getting created with 770 permission. So 
 the user could not read logs on HDFS. the permission needs to be set 
 properly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-5807:
---
Attachment: 0002-MAPREDUCE-5807.patch

Thanks Rohith! The changes appear right. I've refined your patch a bit to 
further constant-ise defaults used in the config-getters, and have corrected 
some wording. Here's now it now appears:

{code}
Usage: terasort [-Dproperty=value] in out
TeraSort configurations are:
mapreduce.terasort.num-rows Number of rows to generate during teragen.
mapreduce.terasort.num.partitions Number of partitions used for sampling.
mapreduce.terasort.partitions.sample Sample size for each partition.
mapreduce.terasort.final.sync Perform a disk-persisting hsync at end of 
every file-write.
mapreduce.terasort.use.terascheduler Use TeraScheduler for computing 
input split distribution.
mapreduce.terasort.simplepartitioner Use SimplePartitioner instead of 
TotalOrderPartitioner.
mapreduce.terasort.output.replication Replication factor to use for 
output data files.
{code}

Let me know if the changes look good to you.

+1 generally, will commit after jenkins run.

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366932#comment-14366932
 ] 

Rohith commented on MAPREDUCE-5807:
---

Thanks for the updated patch. Changes looks good to me
+1 (non-binding)

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366960#comment-14366960
 ] 

Hadoop QA commented on MAPREDUCE-5807:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705321/0002-MAPREDUCE-5807.patch
  against trunk revision 3411732.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-examples.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5305//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5305//console

This message is automatically generated.

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-2847) A tiny improvement for the LOG format

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-2847:
---
Resolution: Not a Problem
Status: Resolved  (was: Patch Available)

Not an issue in MR2. Resolving.

 A tiny improvement for the LOG format
 -

 Key: MAPREDUCE-2847
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2847
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: XieXianshan
Assignee: XieXianshan
Priority: Trivial
 Attachments: MAPREDUCE-2847-v0.2.patch, MAPREDUCE-2847.patch


 A space character is missing in the file 
 src/java/org/apache/hadoop/mapred/TaskInProgress.java(840):LOG.debug(TaskInProgress
  adding + status.getNextRecordRange()).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366943#comment-14366943
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-5807:
---

+1, LGTM too.

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366944#comment-14366944
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-5807:
---

+1, LGTM too.

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated MAPREDUCE-5528:
--
Assignee: Albert Chu

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5869) Wrong date and and time on job tracker page

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-5869:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Wrong date and and time on job tracker page
 ---

 Key: MAPREDUCE-5869
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5869
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: chaitali gupta
Assignee: chaitali gupta
Priority: Trivial
 Attachments: WRONGDATE_TIME.patch


 When an application master restarts during execution of a task, job tracker 
 page displays wrong start date-time for the job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-5807:
---
Affects Version/s: 2.6.0

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6242) Progress report log is incredibly excessive in application master

2015-03-18 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-6242:
-
Target Version/s: 2.8.0  (was: 2.7.0)
  Status: Open  (was: Patch Available)

Thanks [~varun_saxena] for the patch.

Patch looks good to me except these minor comments.

1. There are some trailing whitespace in the newly added code. Can you try to 
avoid these whitespaces?
{code:xml}
stdin:9: trailing whitespace.

warning: squelched 9 whitespace errors
warning: 14 lines add whitespace error
{code}

2. In testTaskProgress(), can you avoid interrupting the thread which throws 
InterruptedException and use the reporter.resetDoneFlag() for notifying the 
thread.

{code:xml}
t.interrupt();
{code}

3. And also can you change the access specifier for these inner classes 
DummyTask, FakeUmbilical and DummyTaskReporter as private?


 Progress report log is incredibly excessive in application master
 -

 Key: MAPREDUCE-6242
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6242
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.4.0
Reporter: Jian Fang
Assignee: Varun Saxena
 Attachments: MAPREDUCE-6242.001.patch, MAPREDUCE-6242.002.patch


 We saw incredibly excessive logs in application master for a long running one 
 with many task attempts. The log write rate is around 1MB/sec in some cases. 
 Most of the log entries were from the progress report such as the following 
 ones.
 2015-02-03 17:46:14,321 INFO [IPC Server handler 56 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.15605757
 2015-02-03 17:46:17,581 INFO [IPC Server handler 2 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.4108217
 2015-02-03 17:46:20,426 INFO [IPC Server handler 0 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_02_0 is : 0.06634143
 2015-02-03 17:46:20,807 INFO [IPC Server handler 4 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_00_0 is : 0.6506
 2015-02-03 17:46:21,013 INFO [IPC Server handler 6 on 37661] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1422985365246_0001_m_01_0 is : 0.21723115
 Looks like the report interval is controlled by a hard-coded variable 
 PROGRESS_INTERVAL as 3 seconds in class org.apache.hadoop.mapred.Task. We 
 should allow users to set the appropriate progress interval for their 
 applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-5807:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the patch [~rohithsharma] (and for the additional review [~ozawa])!

Committed to branch-2 and trunk.

 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Fix For: 2.8.0

 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAPREDUCE-5556) mapred docs have incorrect classpath

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved MAPREDUCE-5556.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Harsh J
 Hadoop Flags: Reviewed

I've gone ahead and fixed this trivial issue by committing the attached change, 
to branch-1.

This is not an issue on trunk/branch-2. Thanks for reporting Allen!

 mapred docs have incorrect classpath
 

 Key: MAPREDUCE-5556
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5556
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Harsh J
Priority: Trivial
  Labels: newbie
 Fix For: 1.3.0

 Attachments: MAPREDUCE-5556.patch


 http://hadoop.apache.org/docs/stable/mapred_tutorial.html
 The classpath for javac under the Usage section is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6275) Race condition in FileOutputCommitter v2 for user-specified task output subdirs

2015-03-18 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367128#comment-14367128
 ] 

Jason Lowe commented on MAPREDUCE-6275:
---

bq.I did not change the semantics of the file replacement even if the existing 
path a directory. I am also not aware what is the background here, and tend to 
agree that we should have an extra check. Probably in a separate JIRA.

Ah, yes, I missed this is basically existing behavior.  Agree we can tackle 
this in a separate JIRA.

 bq. I don't think /tmp is hard coded here

It is hardcoded as the fallback for a missing test.build.data setting.  More 
specifically, I think it should be something like this:

{code}
private static final Path outDir = new Path(
  System.getProperty(test.build.data, 
System.getProperty(java.io.tmpdir)),
  TestFileOutputCommitter.class.getName());
{code}


 Race condition in FileOutputCommitter v2 for user-specified task output 
 subdirs
 ---

 Key: MAPREDUCE-6275
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6275
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Gera Shegalov
Priority: Critical
 Attachments: MAPREDUCE-6275.002.patch, MAPREDUCE-6275.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367043#comment-14367043
 ] 

Hudson commented on MAPREDUCE-5807:
---

FAILURE: Integrated in Hadoop-trunk-Commit #7357 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7357/])
MAPREDUCE-5807. Print usage for TeraSort job. Contributed by Rohith. (harsh: 
rev 9d72f939759f407796ecb4715c2dc2f0d36d5578)
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSortConfigKeys.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraScheduler.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/test/java/org/apache/hadoop/examples/terasort/TestTeraSort.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraGen.java


 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Fix For: 2.8.0

 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5556) mapred docs have incorrect classpath

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-5556:
---
Attachment: MAPREDUCE-5556.patch

 mapred docs have incorrect classpath
 

 Key: MAPREDUCE-5556
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5556
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Allen Wittenauer
Priority: Trivial
  Labels: newbie
 Attachments: MAPREDUCE-5556.patch


 http://hadoop.apache.org/docs/stable/mapred_tutorial.html
 The classpath for javac under the Usage section is incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (MAPREDUCE-6277) Job In Error State Will Lost Jobhistory Of Second and Later Attempts

2015-03-18 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved YARN-3335 to MAPREDUCE-6277:
-

Affects Version/s: (was: 2.7.0)
   2.7.0
  Key: MAPREDUCE-6277  (was: YARN-3335)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

 Job In Error State Will Lost Jobhistory Of Second and Later Attempts
 

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6277:
--
Target Version/s: 2.7.0
 Summary: Job can post multiple history files if attempt loses 
connection to the RM  (was: Job In Error State Will Lost Jobhistory Of Second 
and Later Attempts)

Thanks for the patch, Chang.  Looks good overall, just some comments on the 
test:

The test sets the wait interval to 1ms but I notice it doesn't loop to try.  
Theoretically we could race through this code in the same millisecond and the 
test will fail for the wrong reasons.  We should either set the retry interval 
to 0 so it always fails even on the first try or introduce a small sleep (e.g.: 
10 msec) after initializing the object but before calling schedule.

Rather than returning in the middle of the test it would be cleaner to handle 
it this way:

{code}
try {
  allocator.schedule();
  Assert.fail(Should Have Exception);
} catch (YarnRuntimeException e) {
  Assert.assertTrue(e.getMessage().contains(Could not contact RM after));
}
dispatcher.await();
Assert.assertEquals(Should Have 1 Job Event, 1,
[...]
{code}

Nit: The lack of indentation on this continued line makes the code harder to 
read:
{code}
  Assert.assertEquals(Should Have 1 Job Event, 1,
  allocator.jobEvents.size());
{code}

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6277:
--
Component/s: mr-am

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core

2015-03-18 Thread Ewan Higgs (JIRA)

Ewan Higgs created MAPREDUCE-6278:
-

 Summary: Multithreaded maven build breaks in 
hadoop-mapreduce-client-core
 Key: MAPREDUCE-6278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0, 3.0.0, 2.4.0
 Environment: Linux (Fedora 21)
Reporter: Ewan Higgs


[As reported on the mailing 
list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231].

The following breaks:
{{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}}

...
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
on project hadoop-mapreduce: Failed to create assembly: Artifact: 
org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included by 
module) does not have an artifact with a file. Please ensure the package phase 
is run before the assembly is generated. - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
on project hadoop-mapreduce: Failed to create assembly: Artifact: 
org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included by 
module) does not have an artifact with a file. Please ensure the package phase 
is run before the assembly is generated.
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at 
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188)
at 
org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
assembly: Artifact: 
org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included by 
module) does not have an artifact with a file. Please ensure the package phase 
is run before the assembly is generated.
at 
org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
... 11 more
Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
(included by module) does not have an artifact with a file. Please ensure the 
package phase is run before the assembly is generated.
at 
org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318)
at 
org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228)
at 
org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111)
at 
org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183)
at 
org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436)
... 13 more
{code}

Dmitry Siminov appears to be building on Windows. I'm using Linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367203#comment-14367203
 ] 

Hadoop QA commented on MAPREDUCE-6277:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704859/YARN-3335.2.patch
  against trunk revision 9d72f93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5307//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5307//console

This message is automatically generated.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6277:

Status: Open  (was: Patch Available)

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.patch, YARN-3335.1.patch, 
 YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367308#comment-14367308
 ] 

Hudson commented on MAPREDUCE-5807:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2086 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2086/])
MAPREDUCE-5807. Print usage for TeraSort job. Contributed by Rohith. (harsh: 
rev 9d72f939759f407796ecb4715c2dc2f0d36d5578)
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/test/java/org/apache/hadoop/examples/terasort/TestTeraSort.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraGen.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSortConfigKeys.java
* hadoop-mapreduce-project/CHANGES.txt


 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Fix For: 2.8.0

 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6277:

Attachment: MAPREDUCE-6277.patch

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.patch, YARN-3335.1.patch, 
 YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5807) Print usage for TeraSort job.

2015-03-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367324#comment-14367324
 ] 

Hudson commented on MAPREDUCE-5807:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #136 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/136/])
MAPREDUCE-5807. Print usage for TeraSort job. Contributed by Rohith. (harsh: 
rev 9d72f939759f407796ecb4715c2dc2f0d36d5578)
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSortConfigKeys.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraInputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraOutputFormat.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/test/java/org/apache/hadoop/examples/terasort/TestTeraSort.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraGen.java
* hadoop-mapreduce-project/CHANGES.txt


 Print usage for TeraSort job.
 -

 Key: MAPREDUCE-5807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Trivial
 Fix For: 2.8.0

 Attachments: 0001-MAPREDUCE-5807.patch, 0002-MAPREDUCE-5807.patch, 
 MAPREDUCE-5807.patch


 For new to hadoop, try for getting help mesage for examples jobs provided in 
 mapreduce. These Usage helps them in providing arguements.
 terasort job execution does not print Usage message instead throw exception.
 ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar terasort
 14/03/24 15:34:55 INFO terasort.TeraSort: starting
 java.lang.ArrayIndexOutOfBoundsException: 0
 at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:283)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367458#comment-14367458
 ] 

Hadoop QA commented on MAPREDUCE-6277:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12705360/MAPREDUCE-6277.patch
  against trunk revision 9d72f93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5308//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5308//console

This message is automatically generated.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.patch, YARN-3335.1.patch, 
 YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6277:

Status: Patch Available  (was: Open)

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.patch, YARN-3335.1.patch, 
 YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated MAPREDUCE-5528:
--
Status: Open  (was: Patch Available)

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 2.6.0, 2.4.1, 2.5.0, 0.20.2, 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAPREDUCE-3536) consider whether the same instance of a ServiceStateChangeListener should be allowed to listen to events

2015-03-18 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved MAPREDUCE-3536.

Resolution: Duplicate

 consider whether the same instance of a ServiceStateChangeListener should be 
 allowed to listen to events
 

 Key: MAPREDUCE-3536
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3536
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0, 2.0.0-alpha
Reporter: Steve Loughran
Priority: Trivial

 Currently there is no limit on the number of times a listener can register 
 for events; it's a simple list. A service must unregister the same number of 
 times that it registers.
 Is this the desired behaviour? If so it should be documented in the 
 {{Service}} interface rather than just implicitly in the {{AbstractService}} 
 implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367497#comment-14367497
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-5528:
---

[~ehiggs] [~chu11] thank you for taking this issue. Could you update following 
line to use job.getLocalCacheFiles() instead of 
DistributedCache.getLocalCacheFiles since the method is deprecated?
{code}
+Path[] localPaths = DistributedCache.getLocalCacheFiles(conf);
{code}

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort

[jira] [Assigned] (MAPREDUCE-6279) AM should explicity exit JVM after all services have stopped

2015-03-18 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned MAPREDUCE-6279:
-

Assignee: Eric Payne

 AM should explicity exit JVM after all services have stopped
 

 Key: MAPREDUCE-6279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Eric Payne

 Occasionally the MapReduce AM can get stuck trying to shut down.  
 MAPREDUCE-6049 and MAPREDUCE-5888 were specific instances that have been 
 fixed, but this can also occur with uber jobs if the task code inadvertently 
 leaves non-daemon threads lingering.
 We should explicitly shutdown the JVM after the MapReduce AM has unregistered 
 and all services have been stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated MAPREDUCE-6277:

Attachment: MAPREDUCE-6277.2.patch

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.2.patch, MAPREDUCE-6277.patch, 
 YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Albert Chu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367545#comment-14367545
 ] 

Albert Chu commented on MAPREDUCE-5528:
---

Sure, I'll update the patch and retest just to make sure everything is fine and 
dandy.

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.



--

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367549#comment-14367549
 ] 

Jason Lowe commented on MAPREDUCE-6277:
---

Thanks for updating the patch, Chang.  One last nit with the test: there's a 
bunch of code in the catch clause that doesn't need to be there.  The only 
thing that I would expect to be there is the exception message assert.  
Everything else can be moved outside of the catch clause.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.patch, YARN-3335.1.patch, 
 YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6165) [JDK8] TestCombineFileInputFormat failed on JDK8

2015-03-18 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367619#comment-14367619
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-6165:
---

[~ajisakaa] Thank you for taking this issue.

Why not using assertEquals(2, splits.size())?
{code}
-  assertEquals(2, splits.size());
+  assertTrue(splits.size() = 1);
+  assertTrue(splits.size() = 2);
{code}

I think some applications depend on the ordering...maybe it would be good to 
use TreeMapK,V or LinkedHashMapK, V instead of HashMap.

 [JDK8] TestCombineFileInputFormat failed on JDK8
 

 Key: MAPREDUCE-6165
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6165
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: MAPREDUCE-6165-001.patch, MAPREDUCE-6165-reproduce.patch


 The error msg:
 {noformat}
 testSplitPlacementForCompressedFiles(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 2.487 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacementForCompressedFiles(TestCombineFileInputFormat.java:911)
 testSplitPlacement(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat)
   Time elapsed: 0.985 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:2 but was:1
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:78)
   at junit.framework.Assert.assertEquals(Assert.java:234)
   at junit.framework.Assert.assertEquals(Assert.java:241)
   at junit.framework.TestCase.assertEquals(TestCase.java:409)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat.testSplitPlacement(TestCombineFileInputFormat.java:368)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6277:
--
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Chang!  I committed this to trunk, branch-2, and branch-2.7.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6277.2.patch, MAPREDUCE-6277.patch, 
 YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Chang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367741#comment-14367741
 ] 

Chang Li commented on MAPREDUCE-6277:
-

Thanks [~jlowe] for quick review!

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6277.2.patch, MAPREDUCE-6277.patch, 
 YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6275) Race condition in FileOutputCommitter v2 for user-specified task output subdirs

2015-03-18 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-6275:
-
Affects Version/s: 2.7.0

 Race condition in FileOutputCommitter v2 for user-specified task output 
 subdirs
 ---

 Key: MAPREDUCE-6275
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6275
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Siqi Li
Assignee: Gera Shegalov
Priority: Critical
 Attachments: MAPREDUCE-6275.002.patch, MAPREDUCE-6275.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5050) Cannot find partition.lst in Terasort on Hadoop/Local File System

2015-03-18 Thread Albert Chu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Chu updated MAPREDUCE-5050:
--
Description: 
I'm trying to simulate running Hadoop on Lustre by configuring it to use the 
local file system using a single cloudera VM (cdh3u4).

I can generate the data just fine, but when running the sorting portion of the 
program, I get an error about not being able to find the _partition.lst file. 
It exists in the generated data directory.

Perusing the Terasort code, I see in the main method that has a Path reference 
to partition.lst, which is created with the parent directory.

{noformat}
  public int run(String[] args) throws Exception {
   LOG.info(starting);
  JobConf job = (JobConf) getConf();
  Path inputDir = new Path(args[0]);
  inputDir = inputDir.makeQualified(inputDir.getFileSystem(job));
  Path partitionFile = new Path(inputDir, TeraInputFormat.PARTITION_FILENAME);
  URI partitionUri = new URI(partitionFile.toString() +
   # + TeraInputFormat.PARTITION_FILENAME);
  TeraInputFormat.setInputPaths(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setJobName(TeraSort);
  job.setJarByClass(TeraSort.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(Text.class);
  job.setInputFormat(TeraInputFormat.class);
  job.setOutputFormat(TeraOutputFormat.class);
  job.setPartitionerClass(TotalOrderPartitioner.class);
  TeraInputFormat.writePartitionFile(job, partitionFile);
  DistributedCache.addCacheFile(partitionUri, job);
  DistributedCache.createSymlink(job);
  job.setInt(dfs.replication, 1);
  TeraOutputFormat.setFinalSync(job, true);
  JobClient.runJob(job);
  LOG.info(done);
  return 0;
  }
{noformat}

But in the configure method, the Path isn't created with the parent directory 
reference.

{noformat}
public void configure(JobConf job) {

  try {
FileSystem fs = FileSystem.getLocal(job);
Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
splitPoints = readPartitions(fs, partFile, job);
trie = buildTrie(splitPoints, 0, splitPoints.length, new Text(), 2);
  } catch (IOException ie) {
throw new IllegalArgumentException(can't read paritions file, ie);
  }

}
{noformat}

I modified the code as follows, and now sorting portion of the Terasort test 
works using the
general file system. I think the above code is a bug.

{noformat}
public void configure(JobConf job) {

  try {
FileSystem fs = FileSystem.getLocal(job);

Path[] inputPaths = TeraInputFormat.getInputPaths(job);
Path partFile = new Path(inputPaths[0], 
TeraInputFormat.PARTITION_FILENAME);

splitPoints = readPartitions(fs, partFile, job);
trie = buildTrie(splitPoints, 0, splitPoints.length, new Text(), 2);
  } catch (IOException ie) {
throw new IllegalArgumentException(can't read paritions file, ie);
  }

}
{noformat}

  was:
I'm trying to simulate running Hadoop on Lustre by configuring it to use the 
local file system using a single cloudera VM (cdh3u4).

I can generate the data just fine, but when running the sorting portion of the 
program, I get an error about not being able to find the _partition.lst file. 
It exists in the generated data directory.

Perusing the Terasort code, I see in the main method that has a Path reference 
to partition.lst, which is created with the parent directory.

  public int run(String[] args) throws Exception {
   LOG.info(starting);
  JobConf job = (JobConf) getConf();
  Path inputDir = new Path(args[0]);
  inputDir = inputDir.makeQualified(inputDir.getFileSystem(job));
  Path partitionFile = new Path(inputDir, TeraInputFormat.PARTITION_FILENAME);
  URI partitionUri = new URI(partitionFile.toString() +
   # + TeraInputFormat.PARTITION_FILENAME);
  TeraInputFormat.setInputPaths(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setJobName(TeraSort);
  job.setJarByClass(TeraSort.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(Text.class);
  job.setInputFormat(TeraInputFormat.class);
  job.setOutputFormat(TeraOutputFormat.class);
  job.setPartitionerClass(TotalOrderPartitioner.class);
  TeraInputFormat.writePartitionFile(job, partitionFile);
  DistributedCache.addCacheFile(partitionUri, job);
  DistributedCache.createSymlink(job);
  job.setInt(dfs.replication, 1);
  TeraOutputFormat.setFinalSync(job, true);
  JobClient.runJob(job);
  LOG.info(done);
  return 0;
  }

But in the configure method, the Path isn't created with the parent directory 
reference.

public void configure(JobConf job) {

  try {
FileSystem fs =

[jira] [Commented] (MAPREDUCE-6275) Race condition in FileOutputCommitter v2 for user-specified task output subdirs

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368188#comment-14368188
 ] 

Hadoop QA commented on MAPREDUCE-6275:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705461/MAPREDUCE-6275.003.patch
  against trunk revision 8c40e88.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5310//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5310//console

This message is automatically generated.

 Race condition in FileOutputCommitter v2 for user-specified task output 
 subdirs
 ---

 Key: MAPREDUCE-6275
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6275
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Siqi Li
Assignee: Gera Shegalov
Priority: Critical
 Attachments: MAPREDUCE-6275.002.patch, MAPREDUCE-6275.003.patch, 
 MAPREDUCE-6275.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6275) Race condition in FileOutputCommitter v2 for user-specified task output subdirs

2015-03-18 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-6275:
-
Attachment: MAPREDUCE-6275.003.patch

Addressing [~jlowe]'s review.

Minor modifications:
- thread pool shutdown in a finally block in TestFileOutputCommitter
- Surround LOG.debug by if (LOG.isDebugEnabled) in FileOutputCommitter

 Race condition in FileOutputCommitter v2 for user-specified task output 
 subdirs
 ---

 Key: MAPREDUCE-6275
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6275
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Siqi Li
Assignee: Gera Shegalov
Priority: Critical
 Attachments: MAPREDUCE-6275.002.patch, MAPREDUCE-6275.003.patch, 
 MAPREDUCE-6275.v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2015-03-18 Thread Albert Chu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368080#comment-14368080
 ] 

Albert Chu commented on MAPREDUCE-5528:
---

Since I wrote my original patch, DistributedCache has been deprecated in favor 
of using the job Context.  Unfortunately, in TeraSort, there is no mapper or 
reducer.  All of the sorting is handled via the partitioner.  As far as I can 
tell, the Job context can't be accessed in the partitioner.  B/c of that, this 
really can't be handled through the patch I had before, assuming we don't want 
to use deprecated code.  Using the basic idea from [~ehiggs] in MAPREDUCE-5050 
wouldn't have worked b/c the JobContext is again needed in newer versions of 
FileOutputFormat.

I was trying to think of a clean way to do this but nothing came to mind each 
of the ways I looked.  I might just not see something that others would.

Open to suggestions.





 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.2, 3.0.0, 2.5.0, 2.4.1, 2.6.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367684#comment-14367684
 ] 

Hadoop QA commented on MAPREDUCE-6277:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705401/MAPREDUCE-6277.2.patch
  against trunk revision 402817c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5309//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5309//console

This message is automatically generated.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.2.patch, MAPREDUCE-6277.patch, 
 YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-4424) The 'mapred job -list' command should show the job name as well

2015-03-18 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367690#comment-14367690
 ] 

Harsh J commented on MAPREDUCE-4424:


Sorry on delay here. +1, changes look good to me. [~ajisakaa], can you take a 
re-look and commit when you get a chance, since you've also already taken a 
look at this?

Patch has a few offsets but applies just fine with default fuzz.

 The 'mapred job -list' command should show the job name as well
 ---

 Key: MAPREDUCE-4424
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4424
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Harsh J
Assignee: Avinash Kujur
Priority: Trivial
  Labels: newbie
 Attachments: MAPREDUCE-4424-2.patch, MAPREDUCE-4424-3.patch, 
 MAPREDUCE-4424.patch


 Currently the {{mapred job -list}} command does not show the Job Name, just 
 the Job ID. It would be good to display the Job name too. Idea originally 
 from HADOOP-.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367754#comment-14367754
 ] 

Hudson commented on MAPREDUCE-6277:
---

FAILURE: Integrated in Hadoop-trunk-Commit #7360 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7360/])
MAPREDUCE-6277. Job can post multiple history files if attempt loses connection 
to the RM. Contributed by Chang Li (jlowe: rev 
30da99cbaf36aeef38a858251ce8ffa5eb657b38)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* hadoop-mapreduce-project/CHANGES.txt


 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6277.2.patch, MAPREDUCE-6277.patch, 
 YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Chang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367494#comment-14367494
 ] 

Chang Li commented on MAPREDUCE-6277:
-

Thanks [~jlowe] for review. I have updated patch according to your suggestions.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.patch, YARN-3335.1.patch, 
 YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6279) AM should explicity exit JVM after all services have stopped

2015-03-18 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-6279:
-

 Summary: AM should explicity exit JVM after all services have 
stopped
 Key: MAPREDUCE-6279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Jason Lowe


Occasionally the MapReduce AM can get stuck trying to shut down.  
MAPREDUCE-6049 and MAPREDUCE-5888 were specific instances that have been fixed, 
but this can also occur with uber jobs if the task code inadvertently leaves 
non-daemon threads lingering.

We should explicitly shutdown the JVM after the MapReduce AM has unregistered 
and all services have been stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6277) Job can post multiple history files if attempt loses connection to the RM

2015-03-18 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367720#comment-14367720
 ] 

Jason Lowe commented on MAPREDUCE-6277:
---

+1 lgtm.  Committing this.

 Job can post multiple history files if attempt loses connection to the RM
 -

 Key: MAPREDUCE-6277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.7.0
Reporter: Chang Li
Assignee: Chang Li
 Attachments: MAPREDUCE-6277.2.patch, MAPREDUCE-6277.patch, 
 YARN-3335.1.patch, YARN-3335.2.patch


 Related to a fixed issue MAPREDUCE-6230 which cause a Job to get into error 
 state. In that situation Job's second or some later attempt could succeed but 
 those later attempts' history file will all be lost. Because the first 
 attempt in error state will copy its history file to intermediate dir while 
 mistakenly think of itself as lastattempt. Jobhistory server will later move 
 the history file of that error attempt from intermediate dir to done dir 
 while ignore the rest of that job attempt's later history files in 
 intermediate dir.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6280) Reject directory vs file path conflict resolution in FileOutputCommitter

2015-03-18 Thread Gera Shegalov (JIRA)

Gera Shegalov created MAPREDUCE-6280:


 Summary: Reject directory vs file path conflict resolution in 
FileOutputCommitter 
 Key: MAPREDUCE-6280
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6280
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Gera Shegalov


If one task commits a directory {{foo}}, and then another task commits file 
{{foo}}, the directory {{foo}} with potentially many files will be wiped out. 
While this is a very unlikely scenario, due to tasks being homogeneous in 
nature, it's so much more important to alert the user by failing the commit. 
This came up in [~jlowe]'s review for MAPREDUCE-6275 and seems to be [the 
behavior in 
branch-1|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L198]
 as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2015-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366721#comment-14366721
 ] 

Hadoop QA commented on MAPREDUCE-5951:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12705266/MAPREDUCE-5951-trunk-v8.patch
  against trunk revision 3bc72cc.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5304//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5304//console

This message is automatically generated.

 Add support for the YARN Shared Cache
 -

 Key: MAPREDUCE-5951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: MAPREDUCE-5951-trunk-v1.patch, 
 MAPREDUCE-5951-trunk-v2.patch, MAPREDUCE-5951-trunk-v3.patch, 
 MAPREDUCE-5951-trunk-v4.patch, MAPREDUCE-5951-trunk-v5.patch, 
 MAPREDUCE-5951-trunk-v6.patch, MAPREDUCE-5951-trunk-v7.patch, 
 MAPREDUCE-5951-trunk-v8.patch


 Implement the necessary changes so that the MapReduce application can 
 leverage the new YARN shared cache (i.e. YARN-1492).
 Specifically, allow per-job configuration so that MapReduce jobs can specify 
 which set of resources they would like to cache (i.e. jobjar, libjars, 
 archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

54 matches

Mail list logo