[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720426#comment-13720426 ] Tsuyoshi OZAWA commented on MAPREDUCE-5153: --- This discussion is "in-mapper combining vs disk-based combining" essentially. If user program including scalding and cascading does in-mapper combining and emits their values based on memory usage, the similar effect can be gotten, although it's partially. In most case, this partial approach is enough to get more performance. What do you think? > Support for running combiners without reducers > -- > > Key: MAPREDUCE-5153 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > > scenario: Workflow mapper -> sort -> combiner -> hdfs > No api change is need, if user set combiner class and reducers = 0 then run > combiner and sent output to HDFS. > Popular libraries such as scalding and cascading are offering this > functionality, but they use caching entire mapper output in memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5420) Remove mapreduce.task.tmp.dir from mapred-default.xml
Sandy Ryza created MAPREDUCE-5420: - Summary: Remove mapreduce.task.tmp.dir from mapred-default.xml Key: MAPREDUCE-5420 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5420 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza mapreduce.task.tmp.dir no longer has any effect, so it should no longer be documented in mapred-default. (There is no YARN equivalent for the property. It now is just always ./tmp). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karuth sanker updated MAPREDUCE-1176: - Priority: Major (was: Minor) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Assignee: Chris Douglas > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, > MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karuth sanker reassigned MAPREDUCE-1176: Assignee: Chris Douglas Can you allow this patch to be included? > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Assignee: Chris Douglas >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, > MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720189#comment-13720189 ] karuth sanker commented on MAPREDUCE-1176: -- I agree with Jonathan, this feature is very critical to do lot of files I deal with. Chris and others, Can you please allow this feature to be included? > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, > MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720137#comment-13720137 ] Kihwal Lee commented on MAPREDUCE-1981: --- +1 The patch for branch-0.23 looks good too. > Improve getSplits performance by using listFiles, the new FileSystem API > > > Key: MAPREDUCE-1981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Attachments: mapredListFiles1.patch, mapredListFiles2.patch, > mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, > mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch > > > This jira will make FileInputFormat and CombinedFileInputForm to use the new > API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720131#comment-13720131 ] Jason Lowe commented on MAPREDUCE-4421: --- bq. Also, what would it take to make this work easily for non-MR frameworks? Other frameworks can do a similar trick, and note that I didn't have to make any YARN changes for it to work. Well, there is the aux service issue as I mentioned, but otherwise it can be done in a similar fashion. All it's basically doing from a YARN standpoint is having the client automatically bundle an archive as a LocalResource and doctoring the container environment accordingly. I thought I heard Tez was being deployed this way, but I haven't verified that. At the last Hadoop Summit, [~tucu00] had what I thought was a brilliant idea. Not only the idea of grabbing the framework support code for containers via HDFS, but having the *client* code come from an HDFS blob as well. There would be some yarn command to launch an application for a particular version of a framework, and that command would look in a configured place where frameworks are stored, pick out the appropriate version of the named framework, download the client code, and invoke the client to complete the rest of the app submission. The client could then bundle the rest of the framework in a similar fashion to how it's being done for MapReduce here. In essence, it would be a one-step deploy for app frameworks on YARN. Drop a blob in HDFS, and suddenly users can start using that framework even though they don't have any of the framework code installed at the time. There's still some big issues to work out, e.g.: how to download the client code efficiently (it becomes much like a localization issue with managing a cache of clients already downloaded, etc.), and I'm sure there's plenty of other devils in the details. But if accomplished, this would allow one-step deploys for application frameworks in YARN which I think would be a great feature. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720108#comment-13720108 ] Jason Lowe commented on MAPREDUCE-4421: --- bq. Would it make sense to allow directories as well in mapreduce.application.framework.path? That would make it easier to swap out a jar without rebuilding the tarball. The problem with directories is that officially they are unsupported in the distributed cache. Besides that, from a practical standpoint, it's much more difficult for a nodemanager to verify it doesn't need to localize anything when the item being localized is an arbitrary directory tree. That's a lot of HDFS stats to do vs. just one for the archive case. bq. Does the distributed cache actually cache things in between jobs? Yes, it does if it can. It depends upon the visibility of the item being localized. If it's PUBLIC the resource will be cached and reused among all users and all jobs. If PRIVATE the resource will be cached only per-user but reused between jobs for that user. If APPLICATION then it will only be localized for a single job. See LocalResourceVisibility and ClientDistributedCacheManager.determineCacheVisibilities for some details. The javadoc is correct in that even for the APPLICATION case a resource will only be localized once even though multiple containers may run on the same node, so it's more efficient than just letting the tasks hit HDFS directly for the resource when multiple tasks run on the same node and the resource is needed by all tasks. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720036#comment-13720036 ] Sandy Ryza commented on MAPREDUCE-4421: --- Very cool. Would it make sense to allow directories as well in mapreduce.application.framework.path? That would make it easier to swap out a jar without rebuilding the tarball. Also, what would it take to make this work easily for non-MR frameworks? Does the distributed cache actually cache things in between jobs? The javadoc says "Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves." If so, we should probably modify the javadoc. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720009#comment-13720009 ] Hadoop QA commented on MAPREDUCE-5419: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594245/MAPREDUCE-5419.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.security.TestMRCredentials org.apache.hadoop.mapreduce.v2.TestNonExistentJob {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3903//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3903//console This message is automatically generated. > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > Attachments: MAPREDUCE-5419.patch > > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5403: -- Attachment: MAPREDUCE-5403-2.patch > MR changes to accommodate yarn.application.classpath being moved to the > server-side > --- > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, > MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5386) Ability to refresh history server job retention and job cleaner settings
[ https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5386: -- Summary: Ability to refresh history server job retention and job cleaner settings (was: Refresh job retention time,job cleaner interval, enable/disable cleaner) +1 > Ability to refresh history server job retention and job cleaner settings > > > Key: MAPREDUCE-5386 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, > JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt > > > We want to be able to refresh following job retention parameters > without having to bounce the history server : > 1. Job retention time - mapreduce.jobhistory.max-age-ms > 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms > 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719966#comment-13719966 ] Jason Lowe commented on MAPREDUCE-5251: --- Thanks, Ashwin! I committed to trunk and branch-2. Could you provide a patch for branch-0.23? > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719959#comment-13719959 ] Jason Lowe commented on MAPREDUCE-5251: --- +1 > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719937#comment-13719937 ] Hadoop QA commented on MAPREDUCE-4421: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594236/MAPREDUCE-4421.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3902//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3902//console This message is automatically generated. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated MAPREDUCE-5419: - Attachment: MAPREDUCE-5419.patch > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > Attachments: MAPREDUCE-5419.patch > > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated MAPREDUCE-5419: - Description: The write directory "slive" is not getting created on the FS. (was: The write directory "slive" is not getting created an the FS.) > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Parker updated MAPREDUCE-5419: - Status: Patch Available (was: Open) > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.9, trunk, 2.1.0-beta >Reporter: Robert Parker >Assignee: Robert Parker > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
Robert Parker created MAPREDUCE-5419: Summary: TestSlive is getting FileNotFound Exception Key: MAPREDUCE-5419 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.9, trunk, 2.1.0-beta Reporter: Robert Parker Assignee: Robert Parker The write directory "slive" is not getting created an the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4421: -- Attachment: MAPREDUCE-4421.patch Updated patch to fix extra warnings. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5386) Refresh job retention time,job cleaner interval, enable/disable cleaner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719896#comment-13719896 ] Hadoop QA commented on MAPREDUCE-5386: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594225/JOB_RETENTION--5.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3901//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3901//console This message is automatically generated. > Refresh job retention time,job cleaner interval, enable/disable cleaner > --- > > Key: MAPREDUCE-5386 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, > JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt > > > We want to be able to refresh following job retention parameters > without having to bounce the history server : > 1. Job retention time - mapreduce.jobhistory.max-age-ms > 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms > 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719895#comment-13719895 ] Hadoop QA commented on MAPREDUCE-5251: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594228/MAPREDUCE-5251-7.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3900//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3900//console This message is automatically generated. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5251: -- Attachment: MAPREDUCE-5251-7.txt Thanks,patch refreshed. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719881#comment-13719881 ] Hadoop QA commented on MAPREDUCE-4421: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594215/MAPREDUCE-4421.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1150 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3899//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3899//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3899//console This message is automatically generated. > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5386) Refresh job retention time,job cleaner interval, enable/disable cleaner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5386: -- Attachment: JOB_RETENTION--5.txt Thanks,patch updated. > Refresh job retention time,job cleaner interval, enable/disable cleaner > --- > > Key: MAPREDUCE-5386 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, > JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt > > > We want to be able to refresh following job retention parameters > without having to bounce the history server : > 1. Job retention time - mapreduce.jobhistory.max-age-ms > 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms > 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719869#comment-13719869 ] Jason Lowe commented on MAPREDUCE-5403: --- I think YARN_APPLICATION_CLASSPATH is appropriate since it sounds like the classpath to use for YARN applications vs. some other classpath to use for YARN backend stuff like the daemons themselves. However I think YARN_FRAMEWORK_CLASSPATH could work fine as well. > MR changes to accommodate yarn.application.classpath being moved to the > server-side > --- > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4421: -- Attachment: MAPREDUCE-4421.patch Submitting a patch to try to move this forward. We're very interested in the ability to patch issues in the MapReduce framework without having to bring down the cluster and/or push a new version to all nodes. This patch adds a new config, {{mapreduce.application.framework.path}}, which defaults to being unset. If set, it specifies a path to an archive containing the MR framework to use with the job. Normally this would point to a public location within HDFS, and the archive would contain all the MR jars and their dependencies, i.e.: MR jars, YARN client jars, HDFS client, common, and all their dependencies. This allows ops to deposit a single archive into HDFS that contains the MR framework and configure mapred-site.xml to use it. That framework is then lazily deployed to the nodes. A new version can be uploaded to another path, the mapred-site.xml updated, and then all future jobs run with the new version while all currently running jobs proceed with the previous version. Or ops can avoid pushing the mapred-site.xml change out to all gateway/launcher boxes by using a standard path symlink that always points to the current version to use. New versions can be deployed, the symlink moved to them, and jobs implicitly pick up the new version without pushing a corresponding mapred-site.xml change. I've tested this by taking the entire hadoop-3.0.0-SNAPSHOT.tar.gz file and placing it in HDFS under /mapred/. Admittedly, this is not the most efficient deployment, but it does include everything necessary. I then set mapreduce.application.framework.path to /mapred/hadoop-3.0.0-SNAPSHOT.tar.gz#mr-framework and mapreduce.application.classpath to: {noformat} $PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop-3.0.0-SNAPSHOT/share/hadoop/hdfs/lib/* {noformat} The job then ran with my specified version of the MR framework instead of the one deployed to the nodes. The application classpath is complicated because I used the standard distribution tarball. I could have easily built a custom tarball with all the jars at the top directory and simply had a classpath of: {noformat} $PWD/mr-framework/*.jar {noformat} The framework is lazily deployed via the distributed cache, so nodes take a localization hit the first time they see a job with a specified framework path. However subsequent jobs with the same framework run quickly, and I saw no performance difference between jobs using a custom framework and jobs using the cluster-installed framework on nodes that had already localized the specified framework. Note that there is still a dependency on deployed MR jars with respect to the shuffle service running on all the nodes. With this patch, new MR versions can only be used when the old shuffle service on all nodes is compatible with the new version. Fixing this requires the ability to specify auxiliary services with YARN application submissions and have those lazily deploy to nodes that are allocated for the application. (And ideally subsequently refcounted and retired once no longer necessary.) > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Vinod Kumar Vavilapalli > Attachments: MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4421: -- Assignee: Jason Lowe (was: Vinod Kumar Vavilapalli) Target Version/s: 2.3.0 Status: Patch Available (was: Open) > Remove dependency on deployed MR jars > - > > Key: MAPREDUCE-4421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Arun C Murthy >Assignee: Jason Lowe > Attachments: MAPREDUCE-4421.patch > > > Currently MR AM depends on MR jars being deployed on all nodes via implicit > dependency on YARN_APPLICATION_CLASSPATH. > We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, > probably, just rely on adding a shaded MR jar along with job.jar to the > dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719821#comment-13719821 ] Sandy Ryza commented on MAPREDUCE-5403: --- Filed YARN-973 for the YARN changes. Uploading a patch with the mentioned changes. Including the MR changes here to make reviewing easier and because they won't stand on their own. I noticed that there's a separate APP_CLASSPATH environment variable, which does something else. It might make sense to rename YARN_APPLICATION_CLASSPATH to something like YARN_FRAMEWORK_CLASSPATH? Any name suggestions welcome. > MR changes to accommodate yarn.application.classpath being moved to the > server-side > --- > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5403: -- Summary: MR changes to accommodate yarn.application.classpath being moved to the server-side (was: yarn.application.classpath requires client to know service internals) > MR changes to accommodate yarn.application.classpath being moved to the > server-side > --- > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719788#comment-13719788 ] Sandy Ryza commented on MAPREDUCE-5367: --- Are you looking at branch-1 or trunk? The patch is for branch-1, but only trunk's LocalJobRunner has SUBDIR and getLocalTaskDir. Will fix the double path separator. > Local jobs all use same local working directory > --- > > Key: MAPREDUCE-5367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5367-b1.patch > > > This means that local jobs, even in different JVMs, can't run concurrently > because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5386) Refresh job retention time,job cleaner interval, enable/disable cleaner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719758#comment-13719758 ] Jason Lowe commented on MAPREDUCE-5386: --- bq. Was there an intent for a more comprehensive test? I see a lot of refactoring for methods that seem unrelated to this patch otherwise (e.g.: scanDirectoryForHistoryFilesWrapper, deleteDir, etc.) My apologies, I totally missed the new test file which makes extensive use of these. I must have been looking at just the git diff output after applying your patch instead of the patch directly. Rather than use the scanDirectoryForHistoryFilesWrapper, I think it'd be cleaner if we just promote scanDirectoryForHistoryFilesWrapper to a protected method so tests can override it. It's currently a private method and no callers require it to be static. > Refresh job retention time,job cleaner interval, enable/disable cleaner > --- > > Key: MAPREDUCE-5386 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, > JOB_RETENTION-3.txt, JOB_RETENTION-4.txt > > > We want to be able to refresh following job retention parameters > without having to bounce the history server : > 1. Job retention time - mapreduce.jobhistory.max-age-ms > 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms > 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3193) FileInputFormat doesn't read files recursively in the input path dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719662#comment-13719662 ] Jason Lowe commented on MAPREDUCE-3193: --- bq. I don't think it is a bug. You can use FileInputFormat.addInputPaths(job, inputPath) to repleace it. The heart of the issue is consistency between org.apache.hadoop.mapred.FileInputFormat and org.apache.hadoop.mapreduce.lib.input.FileInputFormat. The former supports recursive processing of input paths when {{mapred.input.dir.recursive}} is true while the latter did not. This changes org.apache.hadoop.mapreduce.lib.input.FileInputFormat to match that behavior so users of the mapreduce API can easily process input paths recursively. > FileInputFormat doesn't read files recursively in the input path dir > > > Key: MAPREDUCE-3193 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 0.23.2, 2.0.0-alpha, 3.0.0 >Reporter: Ramgopal N >Assignee: Devaraj K > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-3193-1.patch, MAPREDUCE-3193-2.patch, > MAPREDUCE-3193-2.patch, MAPREDUCE-3193-3.patch, MAPREDUCE-3193-4.patch, > MAPREDUCE-3193-5.patch, MAPREDUCE-3193.patch, MAPREDUCE-3193.security.patch > > > java.io.FileNotFoundException is thrown,if input file is more than one folder > level deep and the job is getting failed. > Example:Input file is /r1/r2/input.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719649#comment-13719649 ] Tom White commented on MAPREDUCE-5367: -- I noticed that {{jobDir}} is being assigned to {{LocalJobRunner.SUBDIR}}, i.e. "localRunner/" with no job ID, so {{getLocalTaskDir()}} will return a path that is not unique for the job. The fix here is to make {{getLocalTaskDir()}} non-static. If {{jobDir}} were renamed to {{JOB_BASE_DIR}} then this problem would be less likely to be re-introduced in the future. Also, {{jobDir}} ends in "/", so there is no need to add another path separator. It's probably better to use the {{new Path(parent, child)}} constructor anyway. > Local jobs all use same local working directory > --- > > Key: MAPREDUCE-5367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5367-b1.patch > > > This means that local jobs, even in different JVMs, can't run concurrently > because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3193) FileInputFormat doesn't read files recursively in the input path dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719377#comment-13719377 ] rulinma commented on MAPREDUCE-3193: I don't think it is a bug. You can use FileInputFormat.addInputPaths(job, inputPath) to repleace it. > FileInputFormat doesn't read files recursively in the input path dir > > > Key: MAPREDUCE-3193 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3193 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 0.23.2, 2.0.0-alpha, 3.0.0 >Reporter: Ramgopal N >Assignee: Devaraj K > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-3193-1.patch, MAPREDUCE-3193-2.patch, > MAPREDUCE-3193-2.patch, MAPREDUCE-3193-3.patch, MAPREDUCE-3193-4.patch, > MAPREDUCE-3193-5.patch, MAPREDUCE-3193.patch, MAPREDUCE-3193.security.patch > > > java.io.FileNotFoundException is thrown,if input file is more than one folder > level deep and the job is getting failed. > Example:Input file is /r1/r2/input.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5254) Fix exception unwrapping and unit tests using UndeclaredThrowable
[ https://issues.apache.org/jira/browse/MAPREDUCE-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719340#comment-13719340 ] Tsuyoshi OZAWA commented on MAPREDUCE-5254: --- [~sseth], what's going on about this ticket? > Fix exception unwrapping and unit tests using UndeclaredThrowable > - > > Key: MAPREDUCE-5254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5254 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.4-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > Follow up to YARN-628. Exception unwrapping for MRClientProtocol needs some > work. Also, there's a bunch of MR tests still relying on > UndeclaredThrowableException which should no longer be thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4750) Enable NNBenchWithoutMR in MapredTestDriver
[ https://issues.apache.org/jira/browse/MAPREDUCE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719337#comment-13719337 ] Tsuyoshi OZAWA commented on MAPREDUCE-4750: --- Thank you for contributing, Liang. Your patch seems good to me. +1. > Enable NNBenchWithoutMR in MapredTestDriver > --- > > Key: MAPREDUCE-4750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, test >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: MAPREDUCE-4750.txt > > > Right now, we could run nnbench from MapredTestDriver only, there's no entry > for NNBenchWithoutMR, it would be better enable it explicitly, such that we > can do namenode benchmark with less influence factors -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4750) Enable NNBenchWithoutMR in MapredTestDriver
[ https://issues.apache.org/jira/browse/MAPREDUCE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4750: -- Hadoop Flags: Reviewed > Enable NNBenchWithoutMR in MapredTestDriver > --- > > Key: MAPREDUCE-4750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, test >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: MAPREDUCE-4750.txt > > > Right now, we could run nnbench from MapredTestDriver only, there's no entry > for NNBenchWithoutMR, it would be better enable it explicitly, such that we > can do namenode benchmark with less influence factors -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4750) Enable NNBenchWithoutMR in MapredTestDriver
[ https://issues.apache.org/jira/browse/MAPREDUCE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4750: -- Assignee: Liang Xie > Enable NNBenchWithoutMR in MapredTestDriver > --- > > Key: MAPREDUCE-4750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, test >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: MAPREDUCE-4750.txt > > > Right now, we could run nnbench from MapredTestDriver only, there's no entry > for NNBenchWithoutMR, it would be better enable it explicitly, such that we > can do namenode benchmark with less influence factors -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5248) Let NNBenchWithoutMR specify the replication factor for its test
[ https://issues.apache.org/jira/browse/MAPREDUCE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5248: -- Assignee: Erik Paulson > Let NNBenchWithoutMR specify the replication factor for its test > > > Key: MAPREDUCE-5248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client, test >Affects Versions: 3.0.0 >Reporter: Erik Paulson >Assignee: Erik Paulson >Priority: Minor > Attachments: MAPREDUCE-5248.txt > > Original Estimate: 1h > Remaining Estimate: 1h > > The NNBenchWithoutMR test creates files with a replicationFactorPerFile > hard-coded to 1. It'd be nice to be able to specify that on the commandline. > Also, it'd be great if MAPREDUCE-4750 was merged along with this fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5248) Let NNBenchWithoutMR specify the replication factor for its test
[ https://issues.apache.org/jira/browse/MAPREDUCE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719333#comment-13719333 ] Tsuyoshi OZAWA commented on MAPREDUCE-5248: --- Thank you for contributing, [~epaulson]. +1 for merging. > Let NNBenchWithoutMR specify the replication factor for its test > > > Key: MAPREDUCE-5248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client, test >Affects Versions: 3.0.0 >Reporter: Erik Paulson >Priority: Minor > Attachments: MAPREDUCE-5248.txt > > Original Estimate: 1h > Remaining Estimate: 1h > > The NNBenchWithoutMR test creates files with a replicationFactorPerFile > hard-coded to 1. It'd be nice to be able to specify that on the commandline. > Also, it'd be great if MAPREDUCE-4750 was merged along with this fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5248) Let NNBenchWithoutMR specify the replication factor for its test
[ https://issues.apache.org/jira/browse/MAPREDUCE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5248: -- Hadoop Flags: Reviewed > Let NNBenchWithoutMR specify the replication factor for its test > > > Key: MAPREDUCE-5248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client, test >Affects Versions: 3.0.0 >Reporter: Erik Paulson >Priority: Minor > Attachments: MAPREDUCE-5248.txt > > Original Estimate: 1h > Remaining Estimate: 1h > > The NNBenchWithoutMR test creates files with a replicationFactorPerFile > hard-coded to 1. It'd be nice to be able to specify that on the commandline. > Also, it'd be great if MAPREDUCE-4750 was merged along with this fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira