[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009793#comment-14009793 ]
jay vyas commented on MAPREDUCE-5902: ------------------------------------- This is an identical jira for the web front end, so i think these should be linked, as they are pretty similar and happening in the same component, although at different parts of the stack. > JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up > jobs with % characters in the name. > ----------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5902 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver > Reporter: jay vyas > Original Estimate: 1h > Remaining Estimate: 1h > > 1) JobHistoryServer sometimes skips over certain history files, and ignores > serving them as completed. > 2) In addition to skipping these files, the JobHistoryServer doesnt > effectively log which files are being skipped , and why. > So In addition to determining why certain types of files are skipped (file > name length doesnt appear to be the reason, rather, it appears to be that % > characters throw the JobHistoryServer filter off), we should log completed > .jhist files which are available in the mr-history/tmp directory, yet they > are skipped for some reason. > *Regarding the actual bug : Skipping completed jhist files* > We will need an author of the JobHistoryServer, I think, to chime in on what > types of paths for jobs are actually valid. It appears that at least some > characters, if in a job name, will make the jobhistoryserver skip recognition > of a completed jhist file. > *Regarding logging* > It would be extremely useful , then, to have a couple of gaurded logs at this > level of the code, so that we can see, in the log folders, why files are > being filtered out , i.e. it is due to filterint or visibility. > {noformat} > private static List<FileStatus> scanDirectory(Path path, FileContext fc, > PathFilter pathFilter) throws IOException { > path = fc.makeQualified(path); > List<FileStatus> jhStatusList = new ArrayList<FileStatus>(); > RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path); > while (fileStatusIter.hasNext()) { > FileStatus fileStatus = fileStatusIter.next(); > Path filePath = fileStatus.getPath(); > if (fileStatus.isFile() && pathFilter.accept(filePath)) { > jhStatusList.add(fileStatus); > } > } > return jhStatusList; > } > {noformat} > *Reproducing* > I was able to reproduce this bug by writing a custom mapreduce job with a job > name, which contained % characters. I have also seen this with a version of > the Mahout ParallelALSFactorizationJob, which includes "-" characters in its > name, which wind up getting replaced by "%2D" later on at some stage in the > job pipeline. -- This message was sent by Atlassian JIRA (v6.2#6252)