[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012350#comment-14012350 ] jay vyas commented on MAPREDUCE-5902: - I can work on a patch for this general agreement that better logging for this class would be ideal? JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. *Regarding the actual bug : Skipping completed jhist files* We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. *Regarding logging* It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} *Reproducing* I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011240#comment-14011240 ] jay vyas commented on MAPREDUCE-5902: - Sure I can try those. In general what is the contract for a Hadoop file system- should it support any character in a file name ? Are there certain escape sequences that have a particular meaning? JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. *Regarding the actual bug : Skipping completed jhist files* We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. *Regarding logging* It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} *Reproducing* I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011953#comment-14011953 ] jay vyas commented on MAPREDUCE-5902: - I've confirmed that, this is a FileSystem issue: I'm using an alternative filesystem, and our plugin behaves differently than HDFS. we can go back to the original goal for this JIRA: *When the JobHistoryServer SCANS directories, it should debug log exactly the files which it sees, so that users can clearly see if certain files arent readable to JHS from just looking at logs.* JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. *Regarding the actual bug : Skipping completed jhist files* We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. *Regarding logging* It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} *Reproducing* I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5902: Description: 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. *Regarding the actual bug : Skipping completed jhist files* We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. *Regarding logging* It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} *Reproducing* I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. was: 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. ** Regarding the actual bug : Skipping completed jhist files ** We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. ** Regarding logging ** It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} ** Reproducing ** I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009793#comment-14009793 ] jay vyas commented on MAPREDUCE-5902: - This is an identical jira for the web front end, so i think these should be linked, as they are pretty similar and happening in the same component, although at different parts of the stack. JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. *Regarding the actual bug : Skipping completed jhist files* We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. *Regarding logging* It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} *Reproducing* I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5805) Unable to parse launch time from job history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010737#comment-14010737 ] jay vyas commented on MAPREDUCE-5805: - Any possible relation of this to MAPREDUCE-5902 ? Unable to parse launch time from job history file - Key: MAPREDUCE-5805 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5805 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.3.0 Reporter: Fengdong Yu Assignee: Akira AJISAKA Fix For: 2.4.0 Attachments: MAPREDUCE-5805.patch when job complete, there are WARN complains in the log: {code} 2014-03-19 13:31:10,036 WARN org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils: Unable to parse launch time from job history file job_1395204058904_0003-1395206473646-root-test_one_word-1395206966214-4-2-SUCCEEDED-root.test-queue-1395206480070.jhist : java.lang.NumberFormatException: For input string: queue {code} because there is (-) in the queue name 'test-queue', we split the job history file name by (-), and get the ninth item as job start time. FileNameIndexUtils.java {code} private static final int JOB_START_TIME_INDEX = 9; {code} but there is another potential issue: if I also include '-' in the job name(test_one_world in this case), there are all misunderstand. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007441#comment-14007441 ] jay vyas commented on MAPREDUCE-5902: - FYI, a concrete example: These paths, whose job names seem to have been truncated at some point i.e. {{ItemRatingVectorsMappe}} is clearly missing an R .. are not getting picked up by the JobHistoryServer . {noformat} └── tom ├── job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist ├── job_1400794299637_0010_conf.xml ├── job_1400794299637_0010.summary ├── job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist ├── job_1400794299637_0011_conf.xml ├── job_1400794299637_0011.summary ├── job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist ├── job_1400794299637_0012_conf.xml └── job_1400794299637_0012.summary {noformat} JobHistoryServer (HistoryFileManager) needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5902: Summary: JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. (was: JobHistoryServer (HistoryFileManager) needs more debug logs.) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007959#comment-14007959 ] jay vyas commented on MAPREDUCE-5902: - After Further investigation, it appears that files with {{ % escape characters }} in them arent picked up by the JobHistoryServer. I'd like the opinion of one of the JobHistoryServer authors to confirm/deny wether jobnames are indeed allowed to include {{%}} signs in them, i.e. {{name%-myName}}. Has anyone else seen this before? I'd be somewhat surprised if I was the only person who has run into it I can't imagine its a configuration error of any sort? The below files appear to be stuck in mr-history purgatory, neither are they detectable as completed jobs from a REST request {{ curl http://10.1.4.138:19888/ws/v1/history/mapreduce/jobs | python -mjson.tool }} to the JobHistoryServer API, **nor** are they ever moved to {{/mr-history/done/}} {noformat} /mr-history/tmp/tom/job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400794299637_0017-1400814057680-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400814090466-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400873461827_0016-140087454-tom-select+count%28*%29+from+bps_cleaned%28Stage%2D1%29-1400874621636-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400873461827_0023-1400894507822-tom-name%252dname-1400894528285-1-1-SUCCEEDED-default.jhist {noformat} JobHistoryServer (HistoryFileManager) needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5902: Description: 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. ** Regarding the actual bug : Skipping completed jhist files ** We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. ** Regarding logging ** It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} ** Reproducing ** I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. was: With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. ** Regarding the actual bug : Skipping completed jhist files ** We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the
[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5902: Summary: JobHistoryServer (HistoryFileManager) needs more debug logs. (was: JobHistoryServer needs more debug logs.) JobHistoryServer (HistoryFileManager) needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5902) JobHistoryServer needs more debug logs.
jay vyas created MAPREDUCE-5902: --- Summary: JobHistoryServer needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006775#comment-14006775 ] jay vyas commented on MAPREDUCE-5902: - FYI, i categorized this as a bug because, without debug logs - it is impossible to trace certain issues which occur during file collection into the done/ directory, and it is probably an implicit requirement that we should be able to know why certain files would be excluded from being collected. JobHistoryServer (HistoryFileManager) needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5894) Make critical YARN properties first class citizens in the build.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5894: Description: We recently found that when deploy hadoop 2.2 with hadoop 2.0 values {noformat} mapreduce_shuffle {noformat} changed to {noformat} mapreduce.shuffle {noformat} . There are likewise many similar examples of parameters which become deprecated over time. See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html I suggest we: 1) Have a list of all mandatory *current* parameters stored in the code, and also, 2) a list of deprecated ones. 3) Then, have the build * automatically fail * a parameter in the madatory list is NOT accessed. this would (a) make it so that unit testing of parameters does not regress and (b) force all updates to the code which change a parameter name, to also include update to deprecated parameter list, before build passes. was: We recently found that when deploy hadoop 2.2 with hadoop 2.0 values {noformat} mapreduce_shuffle {noformat} changed to {noformat} mapreduce.shuffle {noformat} . There are likewise many similar examples of parameters which become deprecated over time. See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html I suggest we: 1) Use the *set of parameters which are deprecated* over time into java class which ships directly with the code, maybe even as a static list inside of Configuration() itself, with *optional extended parameters read from a configurable parameter *, so that ecosystem users (i.e. like Hbase, or alternative file systems) can add their own deprecation info. 2) have this list *checked on yarn daemon startup*. so that unused parameters which are *obviously artifacts are flagged immediately* by the daemon failing immediately. 3)Have a list of all mandatory *current* parameters stored in the code, and also, a list of deprecated ones. Then, have the build * automatically fail * a parameter in the madatory list is NOT accessed. this would (a) make it so that unit testing of parameters does not regress and (b) force all updates to the code which change a parameter name, to also include update to deprecated parameter list, before build passes. Make critical YARN properties first class citizens in the build. Key: MAPREDUCE-5894 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5894 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: jay vyas We recently found that when deploy hadoop 2.2 with hadoop 2.0 values {noformat} mapreduce_shuffle {noformat} changed to {noformat} mapreduce.shuffle {noformat} . There are likewise many similar examples of parameters which become deprecated over time. See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/DeprecatedProperties.html I suggest we: 1) Have a list of all mandatory *current* parameters stored in the code, and also, 2) a list of deprecated ones. 3) Then, have the build * automatically fail * a parameter in the madatory list is NOT accessed. this would (a) make it so that unit testing of parameters does not regress and (b) force all updates to the code which change a parameter name, to also include update to deprecated parameter list, before build passes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5572) Provide alternative logic for getPos() implementation in custom RecordReader
jay vyas created MAPREDUCE-5572: --- Summary: Provide alternative logic for getPos() implementation in custom RecordReader Key: MAPREDUCE-5572 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5572 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: 1.2.1, 1.2.0, 1.1.1, 1.1.0, 1.1.3, 1.2.2 Reporter: jay vyas Priority: Minor The custom RecordReader class defines the getPos() as follows: long currentOffset = currentStream == null ? 0 : currentStream.getPos(); ... This is meant to prevent errors when underlying stream is null. But it doesn't gaurantee to work: The RawLocalFileSystem, for example, currectly will close the underlying file stream once it is consumed, and the currentStream will thus throw a NullPointerException when trying to access the null stream. This is only seen when running this in the context where the MapTask class, which is only relevant in mapred.* API, calls getPos() twice in tandem, before and after reading a record. This custom record reader should be gaurded, or else eliminated, since it assumes something which is not in the FileSystem contract: That a getPos will always return a integral value. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5572) Provide alternative logic for getPos() implementation in custom RecordReader of mapred implementation of MultiFileWordCount
[ https://issues.apache.org/jira/browse/MAPREDUCE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5572: Description: The custom RecordReader class in MultiFileWordCount (MultiFileLineRecordReader) has been replaced in newer examples with a better implementation which uses the CombineFileInputFormat, which doesn't feature this bug. However, this bug nevertheless still exists in 1.x versions of the MultiFileWordCount which rely on the mapred API. The older MultiFileWordCount implementation defines the getPos() as follows: long currentOffset = currentStream == null ? 0 : currentStream.getPos(); ... This is meant to prevent errors when underlying stream is null. But it doesn't gaurantee to work: The RawLocalFileSystem, for example, currectly will close the underlying file stream once it is consumed, and the currentStream will thus throw a NullPointerException when trying to access the null stream. This is only seen when running this in the context where the MapTask class, which is only relevant in mapred.* API, calls getPos() twice in tandem, before and after reading a record. This custom record reader should be gaurded, or else eliminated, since it assumes something which is not in the FileSystem contract: That a getPos will always return a integral value. was: The custom RecordReader class defines the getPos() as follows: long currentOffset = currentStream == null ? 0 : currentStream.getPos(); ... This is meant to prevent errors when underlying stream is null. But it doesn't gaurantee to work: The RawLocalFileSystem, for example, currectly will close the underlying file stream once it is consumed, and the currentStream will thus throw a NullPointerException when trying to access the null stream. This is only seen when running this in the context where the MapTask class, which is only relevant in mapred.* API, calls getPos() twice in tandem, before and after reading a record. This custom record reader should be gaurded, or else eliminated, since it assumes something which is not in the FileSystem contract: That a getPos will always return a integral value. Summary: Provide alternative logic for getPos() implementation in custom RecordReader of mapred implementation of MultiFileWordCount (was: Provide alternative logic for getPos() implementation in custom RecordReader) Provide alternative logic for getPos() implementation in custom RecordReader of mapred implementation of MultiFileWordCount --- Key: MAPREDUCE-5572 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5572 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.1.3, 1.2.1, 1.2.2 Reporter: jay vyas Priority: Minor The custom RecordReader class in MultiFileWordCount (MultiFileLineRecordReader) has been replaced in newer examples with a better implementation which uses the CombineFileInputFormat, which doesn't feature this bug. However, this bug nevertheless still exists in 1.x versions of the MultiFileWordCount which rely on the mapred API. The older MultiFileWordCount implementation defines the getPos() as follows: long currentOffset = currentStream == null ? 0 : currentStream.getPos(); ... This is meant to prevent errors when underlying stream is null. But it doesn't gaurantee to work: The RawLocalFileSystem, for example, currectly will close the underlying file stream once it is consumed, and the currentStream will thus throw a NullPointerException when trying to access the null stream. This is only seen when running this in the context where the MapTask class, which is only relevant in mapred.* API, calls getPos() twice in tandem, before and after reading a record. This custom record reader should be gaurded, or else eliminated, since it assumes something which is not in the FileSystem contract: That a getPos will always return a integral value. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5511) Multifilewc and the mapred.* API: Is the use of getPos() valid?
jay vyas created MAPREDUCE-5511: --- Summary: Multifilewc and the mapred.* API: Is the use of getPos() valid? Key: MAPREDUCE-5511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5511 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: jay vyas Priority: Minor The MultiFileWordCount class in the hadoop examples libraries uses a record reader which switches between files. This behaviour can cause the RawLocalFileSystem to break in a concurrent environment because of the way buffering works (in RawLocalFileSystem, switching between streams results in a temproraily null inner stream, and that inner stream is called by the getPos() implementation in the custom RecordReader for MultiFileWordCount). There are basically 2 ways to handle this: 1) Wrap the getPos() implementation in the object returned by open() in the RawLocalFileSystem to cache the value of getPos() everytime it is called, so that calls to getPos() can return a valid long even if underlying stream is null. OR 2) Update the RecordReader in multifilewc to not rely on the inner input stream and cache the position / return 0 if the stream cannot return a valid value. The final question here is: Is the RecordReader for MultiFileWordCount doing the right thing ? Or is it breaking the contract of getPos()... and really... what SHOULD getPos() return if the underlying stream has already been consumed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5511) Multifilewc and the mapred.* API: Is the use of getPos() valid?
[ https://issues.apache.org/jira/browse/MAPREDUCE-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5511: Affects Version/s: 1.0.0 1.2.0 Multifilewc and the mapred.* API: Is the use of getPos() valid? Key: MAPREDUCE-5511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5511 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: 1.0.0, 1.2.0 Reporter: jay vyas Priority: Minor The MultiFileWordCount class in the hadoop examples libraries uses a record reader which switches between files. This behaviour can cause the RawLocalFileSystem to break in a concurrent environment because of the way buffering works (in RawLocalFileSystem, switching between streams results in a temproraily null inner stream, and that inner stream is called by the getPos() implementation in the custom RecordReader for MultiFileWordCount). There are basically 2 ways to handle this: 1) Wrap the getPos() implementation in the object returned by open() in the RawLocalFileSystem to cache the value of getPos() everytime it is called, so that calls to getPos() can return a valid long even if underlying stream is null. OR 2) Update the RecordReader in multifilewc to not rely on the inner input stream and cache the position / return 0 if the stream cannot return a valid value. The final question here is: Is the RecordReader for MultiFileWordCount doing the right thing ? Or is it breaking the contract of getPos()... and really... what SHOULD getPos() return if the underlying stream has already been consumed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5511) Multifilewc and the mapred.* API: Is the use of getPos() valid?
[ https://issues.apache.org/jira/browse/MAPREDUCE-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768433#comment-13768433 ] jay vyas commented on MAPREDUCE-5511: - Another note: The newer implementations of multifilewordcount in mapreduce.* that dont provide a RecordReader.getPos() implementation don't have this problem. So this really is related also to support for the multifilewordcount class. With new filesystem implementations which mapreduce can work on top of, it is important to define the expected semantics of getPos() for FSInputStreams. Multifilewc and the mapred.* API: Is the use of getPos() valid? Key: MAPREDUCE-5511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5511 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: jay vyas Priority: Minor The MultiFileWordCount class in the hadoop examples libraries uses a record reader which switches between files. This behaviour can cause the RawLocalFileSystem to break in a concurrent environment because of the way buffering works (in RawLocalFileSystem, switching between streams results in a temproraily null inner stream, and that inner stream is called by the getPos() implementation in the custom RecordReader for MultiFileWordCount). There are basically 2 ways to handle this: 1) Wrap the getPos() implementation in the object returned by open() in the RawLocalFileSystem to cache the value of getPos() everytime it is called, so that calls to getPos() can return a valid long even if underlying stream is null. OR 2) Update the RecordReader in multifilewc to not rely on the inner input stream and cache the position / return 0 if the stream cannot return a valid value. The final question here is: Is the RecordReader for MultiFileWordCount doing the right thing ? Or is it breaking the contract of getPos()... and really... what SHOULD getPos() return if the underlying stream has already been consumed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5165) Create MiniMRCluster version which uses the mapreduce package.
jay vyas created MAPREDUCE-5165: --- Summary: Create MiniMRCluster version which uses the mapreduce package. Key: MAPREDUCE-5165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: jay vyas Priority: Minor The MiniMapRedCluster class references some older mapred.* classes. It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5165) Create MiniMRCluster version which uses the mapreduce package.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635378#comment-13635378 ] jay vyas commented on MAPREDUCE-5165: - I have just found that rehashes contents from another JIRA https://issues.apache.org/jira/browse/MAPREDUCE-3169... and there is a MiniMRYarnCluster in fact in the mapreduce package. So.. then... Is MiniMRCluster an artifact of the MR1 days that will be less used once MR2 takes over ? Is MiniMrYarnCluster a generic version of MiniMRCluster which will one day obviate the implementation specific MiniMRCluster altogether for the Hadoop-MapReduce jobs that are implemented in the MR2 YARN framework? Create MiniMRCluster version which uses the mapreduce package. -- Key: MAPREDUCE-5165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: jay vyas Priority: Minor The MiniMapRedCluster class references some older mapred.* classes. It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5165) Create MiniMRCluster version which uses the mapreduce package.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5165: Description: The MiniMapRedCluster class references some older mapred.* classes (as per comments below however, there is the MiniMRYarnCluster, which may aim to replace it). It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? Note that This is all intimately linked to the role that MiniMRYarnCluster will play. Relevant classes: .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRCluster.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java was: The MiniMapRedCluster class references some older mapred.* classes. It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? Create MiniMRCluster version which uses the mapreduce package. -- Key: MAPREDUCE-5165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: jay vyas Priority: Minor The MiniMapRedCluster class references some older mapred.* classes (as per comments below however, there is the MiniMRYarnCluster, which may aim to replace it). It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? Note that This is all intimately linked to the role that MiniMRYarnCluster will play. Relevant classes: .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRCluster.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5165) Create MiniMRCluster version which uses the mapreduce package.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635448#comment-13635448 ] jay vyas commented on MAPREDUCE-5165: - +1 to close, but the deprecation story is somewhat tricky not sure how to improve it. Maybe just a wiki page update to http://wiki.apache.org/hadoop/HowToContribute to explain the changes to MRMiniCluster would be in order here, or something. Create MiniMRCluster version which uses the mapreduce package. -- Key: MAPREDUCE-5165 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5165 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: jay vyas Priority: Minor The MiniMapRedCluster class references some older mapred.* classes (as per comments below however, there is the MiniMRYarnCluster, which may aim to replace it). It could be recreated in the mapreduce package to use the Configuration class instead of JobConf, which would make it simpler to use and integrate with new FS implementations and test harnesses that use new Configuration (not JobConf) objects to drive tests. This could be done many ways: 1) using inheritance or else 2) by copying the code directly The appropriate implementation depends on wether or not 1) Is it okay for mapreduce.* classes to depend on mapred.* classes ? 2) Is the mapred MiniMRCluster implementation going to be deprecated or eliminated anytime? 3) What is the future of the JobConf class - which has been deprecated and then undeprecated ? Note that This is all intimately linked to the role that MiniMRYarnCluster will play. Relevant classes: .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRCluster.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java .//hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira