[
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508468#comment-16508468
]
Wangda Tan commented on MAPREDUCE-7101:
---------------------------------------
Reported offline by [~deepesh]:
After remove the timestamp check completely, we saw a NPE when trying to get
the job:
{code:java}
Exception in thread "main" java.lang.NullPointerException
at
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:324)
at
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:302)
at
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:440)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:637)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184)
at org.apache.hadoop.mapreduce.tools.CLI.getJob(CLI.java:530)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:268)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274){code}
I think there're some corner cases to take care when get job status.
Given both Rohith and me are quite busy in other stuffs, as discussed offline,
[~asuresh] could you help to take a look at this issue? I can help with reviews.
> Revisit behavior of JHS scan file behavior
> ------------------------------------------
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Wangda Tan
> Priority: Critical
>
> Currently, the JHS scan directory if the modification of *directory* changed:
> {code}
> public synchronized void scanIfNeeded(FileStatus fs) {
> long newModTime = fs.getModificationTime();
> if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
> scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues
> of truncated modification time. And HADOOP-12837 mentioned on S3, the
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly
> work on different file systems.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]