[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-29 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012350#comment-14012350
 ] 

jay vyas commented on MAPREDUCE-5902:
-

I can work on a patch for this  general agreement that better 
logging for this class would be ideal?

 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 1) JobHistoryServer sometimes skips over certain history files, and ignores 
 serving them as completed.
 2) In addition to skipping these files, the JobHistoryServer doesnt 
 effectively log which files are being skipped , and why.  
 So In addition to determining why certain types of files are skipped (file 
 name length doesnt appear to be the reason, rather, it appears to be that % 
 characters throw the JobHistoryServer filter off), we should log completed 
 .jhist  files which  are available in the mr-history/tmp directory, yet they 
 are skipped for some reason. 
 *Regarding the actual bug : Skipping completed jhist files* 
 We will need an author of the JobHistoryServer, I think, to chime in on what 
 types of paths for jobs are actually valid.  It appears that at least some 
 characters, if in a job name, will make the jobhistoryserver skip recognition 
 of a completed jhist file.
 *Regarding logging*
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}
 *Reproducing* 
 I was able to reproduce this bug by writing a custom mapreduce job with a job 
 name, which contained % characters.  I have also seen this with a version of 
 the Mahout ParallelALSFactorizationJob, which includes - characters in its 
 name, which wind up getting replaced by %2D later on at some stage in the 
 job pipeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-28 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011203#comment-14011203
 ] 

Ravi Prakash commented on MAPREDUCE-5902:
-

Hi Jay! Is % the only special character which causes the jhist files to not be 
picked up? Could you please try the symbols mentioned in this comment:
https://issues.apache.org/jira/browse/HDFS-13?focusedCommentId=12535371page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12535371

 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 1) JobHistoryServer sometimes skips over certain history files, and ignores 
 serving them as completed.
 2) In addition to skipping these files, the JobHistoryServer doesnt 
 effectively log which files are being skipped , and why.  
 So In addition to determining why certain types of files are skipped (file 
 name length doesnt appear to be the reason, rather, it appears to be that % 
 characters throw the JobHistoryServer filter off), we should log completed 
 .jhist  files which  are available in the mr-history/tmp directory, yet they 
 are skipped for some reason. 
 *Regarding the actual bug : Skipping completed jhist files* 
 We will need an author of the JobHistoryServer, I think, to chime in on what 
 types of paths for jobs are actually valid.  It appears that at least some 
 characters, if in a job name, will make the jobhistoryserver skip recognition 
 of a completed jhist file.
 *Regarding logging*
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}
 *Reproducing* 
 I was able to reproduce this bug by writing a custom mapreduce job with a job 
 name, which contained % characters.  I have also seen this with a version of 
 the Mahout ParallelALSFactorizationJob, which includes - characters in its 
 name, which wind up getting replaced by %2D later on at some stage in the 
 job pipeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-28 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011240#comment-14011240
 ] 

jay vyas commented on MAPREDUCE-5902:
-

Sure I can try those.  

In general what is the contract for a Hadoop file system- should it support any 
character in a file name ? Are there certain escape sequences that have a 
particular meaning?

 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 1) JobHistoryServer sometimes skips over certain history files, and ignores 
 serving them as completed.
 2) In addition to skipping these files, the JobHistoryServer doesnt 
 effectively log which files are being skipped , and why.  
 So In addition to determining why certain types of files are skipped (file 
 name length doesnt appear to be the reason, rather, it appears to be that % 
 characters throw the JobHistoryServer filter off), we should log completed 
 .jhist  files which  are available in the mr-history/tmp directory, yet they 
 are skipped for some reason. 
 *Regarding the actual bug : Skipping completed jhist files* 
 We will need an author of the JobHistoryServer, I think, to chime in on what 
 types of paths for jobs are actually valid.  It appears that at least some 
 characters, if in a job name, will make the jobhistoryserver skip recognition 
 of a completed jhist file.
 *Regarding logging*
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}
 *Reproducing* 
 I was able to reproduce this bug by writing a custom mapreduce job with a job 
 name, which contained % characters.  I have also seen this with a version of 
 the Mahout ParallelALSFactorizationJob, which includes - characters in its 
 name, which wind up getting replaced by %2D later on at some stage in the 
 job pipeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-28 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011953#comment-14011953
 ] 

jay vyas commented on MAPREDUCE-5902:
-

I've confirmed that, this is a FileSystem issue:  I'm using an alternative 
filesystem, and our plugin behaves differently than HDFS.  we can go back to 
the original goal for this JIRA:

*When the JobHistoryServer SCANS directories, it should debug log exactly the  
files which it sees, so that users can clearly see if certain files arent 
readable to JHS from just looking at logs.*

 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 1) JobHistoryServer sometimes skips over certain history files, and ignores 
 serving them as completed.
 2) In addition to skipping these files, the JobHistoryServer doesnt 
 effectively log which files are being skipped , and why.  
 So In addition to determining why certain types of files are skipped (file 
 name length doesnt appear to be the reason, rather, it appears to be that % 
 characters throw the JobHistoryServer filter off), we should log completed 
 .jhist  files which  are available in the mr-history/tmp directory, yet they 
 are skipped for some reason. 
 *Regarding the actual bug : Skipping completed jhist files* 
 We will need an author of the JobHistoryServer, I think, to chime in on what 
 types of paths for jobs are actually valid.  It appears that at least some 
 characters, if in a job name, will make the jobhistoryserver skip recognition 
 of a completed jhist file.
 *Regarding logging*
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}
 *Reproducing* 
 I was able to reproduce this bug by writing a custom mapreduce job with a job 
 name, which contained % characters.  I have also seen this with a version of 
 the Mahout ParallelALSFactorizationJob, which includes - characters in its 
 name, which wind up getting replaced by %2D later on at some stage in the 
 job pipeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-27 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009793#comment-14009793
 ] 

jay vyas commented on MAPREDUCE-5902:
-

This is an identical jira for the web front end, so i think these should be 
linked, as they are pretty similar and happening in the same component, 
although at different parts of the stack.

 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 1) JobHistoryServer sometimes skips over certain history files, and ignores 
 serving them as completed.
 2) In addition to skipping these files, the JobHistoryServer doesnt 
 effectively log which files are being skipped , and why.  
 So In addition to determining why certain types of files are skipped (file 
 name length doesnt appear to be the reason, rather, it appears to be that % 
 characters throw the JobHistoryServer filter off), we should log completed 
 .jhist  files which  are available in the mr-history/tmp directory, yet they 
 are skipped for some reason. 
 *Regarding the actual bug : Skipping completed jhist files* 
 We will need an author of the JobHistoryServer, I think, to chime in on what 
 types of paths for jobs are actually valid.  It appears that at least some 
 characters, if in a job name, will make the jobhistoryserver skip recognition 
 of a completed jhist file.
 *Regarding logging*
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}
 *Reproducing* 
 I was able to reproduce this bug by writing a custom mapreduce job with a job 
 name, which contained % characters.  I have also seen this with a version of 
 the Mahout ParallelALSFactorizationJob, which includes - characters in its 
 name, which wind up getting replaced by %2D later on at some stage in the 
 job pipeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)