[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
[ https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646575#comment-14646575 ] Ivan Mitic commented on MAPREDUCE-6357: --- Thanks [~cotedm], please feel free to take it up. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute -- Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Ivan Mitic After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute
Ivan Mitic created MAPREDUCE-6357: - Summary: MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute Key: MAPREDUCE-6357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Ivan Mitic Assignee: Ivan Mitic After spending the afternoon debugging a user job where reduce tasks were failing on retry with the below exception, I think it would be worthwhile to add a note in the MultipleOutputs.write() documentation, saying that absolute paths may cause improper execution of tasks on retry or when MR speculative execution is enabled. {code} 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: File already exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475) at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433) at com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69) at com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) {code} As discussed in MAPREDUCE-3772, when the baseOutputPath passed to MultipleOutputs.write() is an absolute path (or more precisely a path that resolves outside of the job output-dir), the concept of output committing is not utilized. In this case, the user read thru the MultipleOutputs docs and was assuming that everything will be working fine, as there are blog posts saying that MultipleOutputs does handle output commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176456#comment-14176456 ] Ivan Mitic commented on MAPREDUCE-5911: --- Hi Bruno, it should be ok not to include a test case with this change, it's a minor fix to the examples. Will commit the patch shortly. Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5911: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, branch-2 and branch-2.6. Thank you Bruno for the contribution! Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Fix For: 2.6.0 Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176570#comment-14176570 ] Ivan Mitic commented on MAPREDUCE-5911: --- Thank you [~jira.shegalov] for bringing this up. You are right, this won't work with the default partitioner. Sorry I wasn't aware of MAPREDUCE-4879. Let me take another look and see whether to revert the change that went in or go with your patch as an addendum. Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Fix For: 2.6.0 Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic reopened MAPREDUCE-5911: --- Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5911: -- Fix Version/s: (was: 2.6.0) Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176581#comment-14176581 ] Ivan Mitic commented on MAPREDUCE-5911: --- OK, I am going to revert the change given that it does not work and resolve this Jira as a duplicate of MAPREDUCE-4879. Let's iterate further on the other Jira. Thanks again Gera for catching this. Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic resolved MAPREDUCE-5911. --- Resolution: Duplicate I reverted the patch from trunk, branch-2 and branch-2.6. Resolving this Jira as a dupe of MAPREDUCE-4879, let's iterate on the right fix there. Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5911: -- Assignee: Bruno P. Kinoshita (was: Ivan Mitic) Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5911: -- Status: Patch Available (was: Open) Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
[ https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176210#comment-14176210 ] Ivan Mitic commented on MAPREDUCE-5911: --- Hi Bruno, thanks for contributing the patch! Looks good, +1. Will commit when it comes back with +1 from Jenkins. Terasort TeraOutputFormat does not check for output directory existance --- Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Bruno P. Kinoshita Priority: Minor Attachments: HADOOP-5911.patch The enforcement that the directory must not yet exist is implemented in {{FileOutputFormat#checkOutputSpecs}} by throwing {{FileAlreadyExistsException}}. However, terasort uses a specialized output format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}. The subclass overrides {{checkOutputSpecs}}, but does not re-implement the existence check and throw {{FileAlreadyExistsException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance
Ivan Mitic created MAPREDUCE-5911: - Summary: Terasort TeraOutputFormat does not check for output directory existance Key: MAPREDUCE-5911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Reporter: Ivan Mitic Assignee: Ivan Mitic Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5512) TaskTracker hung after failed reconnect to the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792356#comment-13792356 ] Ivan Mitic commented on MAPREDUCE-5512: --- Thanks Chris for the review, will commit the patch shortly. TaskTracker hung after failed reconnect to the JobTracker - Key: MAPREDUCE-5512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5512 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: hadoop-tasktracker-RD00155DD09100.log, MAPREDUCE-5512.branch-1.patch, tt_Hung.txt TaskTracker hung after failed reconnect to the JobTracker. This is the problematic piece of code: {code} this.distributedCacheManager = new TrackerDistributedCacheManager( this.fConf, taskController); this.distributedCacheManager.startCleanupThread(); this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionActionObject() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } }); {code} In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup thread will never be stopped, and given that it is a non daemon thread it will keep TT up forever. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (MAPREDUCE-5512) TaskTracker hung after failed reconnect to the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic resolved MAPREDUCE-5512. --- Resolution: Fixed Fix Version/s: 1.3.0 1-win Fix committed to branch-1 and branch-1-win. TaskTracker hung after failed reconnect to the JobTracker - Key: MAPREDUCE-5512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5512 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 1-win, 1.3.0 Attachments: hadoop-tasktracker-RD00155DD09100.log, MAPREDUCE-5512.branch-1.patch, tt_Hung.txt TaskTracker hung after failed reconnect to the JobTracker. This is the problematic piece of code: {code} this.distributedCacheManager = new TrackerDistributedCacheManager( this.fConf, taskController); this.distributedCacheManager.startCleanupThread(); this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionActionObject() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } }); {code} In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup thread will never be stopped, and given that it is a non daemon thread it will keep TT up forever. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5512) TaskTracker hung after failed reconnect to the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5512: -- Attachment: MAPREDUCE-5512.branch-1.patch Attaching the patch. My proposal for the fix is to make the dist cache cleanup thread a daemon. Based on the scan thru the code I think it should be safe to make this change. For the unittest, I added a test that validates the list of non-daemon threads. This is a more general test case but I think it will serve well to protect the codebase against regressions in this area. I was not able to come up with a nice way to simulate the condition from this bug without adding a test hook in the production code, so I moved away from this approach (we would have to start JT, stop JT, start JT again which would tell TT to reinit, and then stop JT, but last JT stop must have the right timing and run before TT#initialize() executes). Slightly orthogonally, looking at the list of threads I had to whitelist, there might be some other candidate threads that could be made daemons, but I'd prefer not to make this change in the context of this Jira. TaskTracker hung after failed reconnect to the JobTracker - Key: MAPREDUCE-5512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5512 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: hadoop-tasktracker-RD00155DD09100.log, MAPREDUCE-5512.branch-1.patch, tt_Hung.txt TaskTracker hung after failed reconnect to the JobTracker. This is the problematic piece of code: {code} this.distributedCacheManager = new TrackerDistributedCacheManager( this.fConf, taskController); this.distributedCacheManager.startCleanupThread(); this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionActionObject() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } }); {code} In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup thread will never be stopped, and given that it is a non daemon thread it will keep TT up forever. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5387) Implement Signal.TERM on Windows
Ivan Mitic created MAPREDUCE-5387: - Summary: Implement Signal.TERM on Windows Key: MAPREDUCE-5387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 1-win, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Signal.TERM is currently not supported by Hadoop on the Windows platform. Tracking Jira for the problem. A couple of things to keep in mind: - Support for process groups (JobObjects on Windows) - Solution should work for both java and other streaming Hadoop apps -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5387) Implement Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708079#comment-13708079 ] Ivan Mitic commented on MAPREDUCE-5387: --- Copy-pasting [~cnauroth] comment from MAPREDUCE-5330: {quote} I came across similar issues while working on the YARN nodemanager changes for Windows. Bikas, I agree that this logic doesn't exactly match the meaning of SIGTERM. To match SIGTERM, we really need a way for one process to signal another process with some graceful shutdown message, and a way for the other process to trigger custom code when it receives that message. Unfortunately, I'm not aware of anything in the Windows API that provides an exact match. Therefore, the logic in this patch seems to be the closest approximation that's feasible right now. To elaborate on this, TerminateProcess immediately kills the target process, and there is no way for that process to trap the call and run custom clean-up code. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx This is much different from Unix signals, which allow the target process to install signal handlers to respond gracefully to things like SIGTERM. There also seems to be some support for programmatically sending CTL-C to a process and installing a custom handler to respond to it. This would be SetConsoleCtrlHandler and GenerateConsoleCtrlEvent. I've heard anecdotally that this can be used to create a rough approximation of Unix signals, but I haven't tried it myself. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx Aside from that, the only other option seems to be for Windows applications to roll their own custom IPC protocol (i.e. one process sends another a custom graceful shutdown message over a named pipe). It might be worth pursuing one of these solutions in the long term for absolute correctness, but these approaches will require a lot more coding and testing. {quote} Implement Signal.TERM on Windows Key: MAPREDUCE-5387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 1-win, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Signal.TERM is currently not supported by Hadoop on the Windows platform. Tracking Jira for the problem. A couple of things to keep in mind: - Support for process groups (JobObjects on Windows) - Solution should work for both java and other streaming Hadoop apps -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5330) JVM manager should not forcefully kill the process on Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708081#comment-13708081 ] Ivan Mitic commented on MAPREDUCE-5330: --- Chris, Bikas, Xi, I filed a new Jira MAPREDUCE-5387 to investigate possible ways to implement Signal.TERM on Windows. I have already spent time investigating this some time ago, will try to come up with a proposal in the near term. Chris' summary from above gives a good overview of some possible options (I copied it into the new Jira). JVM manager should not forcefully kill the process on Signal.TERM on Windows Key: MAPREDUCE-5330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Fix For: 1-win Attachments: MAPREDUCE-5330.patch In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map task process is in the middle of doing some cleanup/finalization after the task is done, it might be interrupted/killed without giving it a chance. In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during closing file systems in a special shutdown hook, we're typically uploading storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if this kill happens these metrics get lost. The impact is that for many MR jobs we don't see accurate metrics reported most of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5387) Implement Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5387: -- Issue Type: Improvement (was: Bug) Implement Signal.TERM on Windows Key: MAPREDUCE-5387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0, 1-win, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Signal.TERM is currently not supported by Hadoop on the Windows platform. Tracking Jira for the problem. A couple of things to keep in mind: - Support for process groups (JobObjects on Windows) - Solution should work for both java and other streaming Hadoop apps -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2351) mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690055#comment-13690055 ] Ivan Mitic commented on MAPREDUCE-2351: --- bq. Could someone please backport the newly attached patch to branch-1-win? Chelsey, I committed the patch to branch-1 only, as we'll be merging all branch-1 changes to branch-1-win in a day or so, and your patch will be picked up. mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI Key: MAPREDUCE-2351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2351 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.0, 1-win, 1.3.0 Reporter: Tom White Assignee: Tom White Fix For: 0.23.0, 1.3.0 Attachments: HADOOP-472.branch-1-win.3.patch, MAPREDUCE-2351.branch-1-win.patch, MAPREDUCE-2351.patch Currently, mapred.job.tracker.history.completed.location is resolved relative to the default filesystem. If not set it defaults to history/done in the local log directory. There is no way to set it to another local filesystem location (with a file:// URI) or an arbitrary Hadoop filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2351) mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689787#comment-13689787 ] Ivan Mitic commented on MAPREDUCE-2351: --- Thanks Chelsey for doing the backport. I verified that the new test passes on both Windows and Linux. +1 on the patch. Will commit shortly. mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI Key: MAPREDUCE-2351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2351 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Tom White Assignee: Tom White Fix For: 0.23.0 Attachments: HADOOP-472.branch-1-win.3.patch, MAPREDUCE-2351.patch Currently, mapred.job.tracker.history.completed.location is resolved relative to the default filesystem. If not set it defaults to history/done in the local log directory. There is no way to set it to another local filesystem location (with a file:// URI) or an arbitrary Hadoop filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2351) mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-2351: -- Attachment: MAPREDUCE-2351.branch-1-win.patch Chelsey, the name of your patch does not seem valid. You should name it based on the Apache Jira id. Attaching the same page with the right name. mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI Key: MAPREDUCE-2351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2351 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Tom White Assignee: Tom White Fix For: 0.23.0 Attachments: HADOOP-472.branch-1-win.3.patch, MAPREDUCE-2351.branch-1-win.patch, MAPREDUCE-2351.patch Currently, mapred.job.tracker.history.completed.location is resolved relative to the default filesystem. If not set it defaults to history/done in the local log directory. There is no way to set it to another local filesystem location (with a file:// URI) or an arbitrary Hadoop filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2351) mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-2351: -- Affects Version/s: 1.3.0 1-win 0.23.0 mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI Key: MAPREDUCE-2351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2351 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.0, 1-win, 1.3.0 Reporter: Tom White Assignee: Tom White Fix For: 0.23.0 Attachments: HADOOP-472.branch-1-win.3.patch, MAPREDUCE-2351.branch-1-win.patch, MAPREDUCE-2351.patch Currently, mapred.job.tracker.history.completed.location is resolved relative to the default filesystem. If not set it defaults to history/done in the local log directory. There is no way to set it to another local filesystem location (with a file:// URI) or an arbitrary Hadoop filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2351) mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-2351: -- Fix Version/s: 1.3.0 I committed the backport patch to branch-1. Thank you Chelsey for contribution! mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI Key: MAPREDUCE-2351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2351 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.0, 1-win, 1.3.0 Reporter: Tom White Assignee: Tom White Fix For: 0.23.0, 1.3.0 Attachments: HADOOP-472.branch-1-win.3.patch, MAPREDUCE-2351.branch-1-win.patch, MAPREDUCE-2351.patch Currently, mapred.job.tracker.history.completed.location is resolved relative to the default filesystem. If not set it defaults to history/done in the local log directory. There is no way to set it to another local filesystem location (with a file:// URI) or an arbitrary Hadoop filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded
[ https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5330: -- Target Version/s: 1-win Killing M/R JVM's leads to metrics not being uploaded - Key: MAPREDUCE-5330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Attachments: MAPREDUCE-5330.patch In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map task process is in the middle of doing some cleanup/finalization after the task is done, it might be interrupted/killed without giving it a chance. In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during closing file systems in a special shutdown hook, we're typically uploading storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if this kill happens these metrics get lost. The impact is that for many MR jobs we don't see accurate metrics reported most of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5330) JVM manager should not forcefully kill the process on Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5330: -- Summary: JVM manager should not forcefully kill the process on Signal.TERM on Windows (was: Killing M/R JVM's leads to metrics not being uploaded) JVM manager should not forcefully kill the process on Signal.TERM on Windows Key: MAPREDUCE-5330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Attachments: MAPREDUCE-5330.patch In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map task process is in the middle of doing some cleanup/finalization after the task is done, it might be interrupted/killed without giving it a chance. In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during closing file systems in a special shutdown hook, we're typically uploading storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if this kill happens these metrics get lost. The impact is that for many MR jobs we don't see accurate metrics reported most of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5330) JVM manager should not forcefully kill the process on Signal.TERM on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687387#comment-13687387 ] Ivan Mitic commented on MAPREDUCE-5330: --- Thanks Xi for the patch, looks good to me, +1 JVM manager should not forcefully kill the process on Signal.TERM on Windows Key: MAPREDUCE-5330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Attachments: MAPREDUCE-5330.patch In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map task process is in the middle of doing some cleanup/finalization after the task is done, it might be interrupted/killed without giving it a chance. In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during closing file systems in a special shutdown hook, we're typically uploading storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if this kill happens these metrics get lost. The impact is that for many MR jobs we don't see accurate metrics reported most of the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684565#comment-13684565 ] Ivan Mitic commented on MAPREDUCE-5224: --- I already +1ed on the latest patch, will commit shortly. JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic resolved MAPREDUCE-5224. --- Resolution: Fixed Target Version/s: 1-win Hadoop Flags: Reviewed Fix committed to branch-1-win. Thank you Xi for the contribution! JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5224: -- Affects Version/s: 1-win JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1-win Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5259) TestTaskLog fails on Windows because of path separators missmatch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682389#comment-13682389 ] Ivan Mitic commented on MAPREDUCE-5259: --- Thanks Chris for the review and commit! TestTaskLog fails on Windows because of path separators missmatch - Key: MAPREDUCE-5259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5259 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.1.0-beta Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 3.0.0, 2.1.0-beta Attachments: MAPREDUCE-5259.patch Test failure: {noformat} Running org.apache.hadoop.mapred.TestTaskLog Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.516 sec FAILURE! testTaskLog(org.apache.hadoop.mapred.TestTaskLog) Time elapsed: 409 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestTaskLog.testTaskLog(TestTaskLog.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3540) saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)
[ https://issues.apache.org/jira/browse/MAPREDUCE-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678899#comment-13678899 ] Ivan Mitic commented on MAPREDUCE-3540: --- This Jira seems outdated now that Hadoop can be compiled on Windows without Cygwin. Should we resolve this Jira? saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common) -- Key: MAPREDUCE-3540 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3540 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.24.0, trunk Reporter: Alejandro Abdelnur Fix For: 0.24.0 Attachments: MAPREDUCE-3540-121001.patch, MAPREDUCE-3540.Nov12.patch, MAPREDUCE-3540.patch {code} [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (generate-version) on project hadoop-yarn-common: Comman d execution failed. Cannot run program scripts\saveVersion.sh (in directory C:\cygwin\home\tucu\src\hadoop\hadoop-mapreduce-proje ct\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, The system cannot find the file specified - [Help 1] [ERROR] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5278: -- Summary: Distributed cache is broken when JT staging dir is not on the default FS (was: Perf: Distributed cache is broken when JT staging dir is not on the default FS) Distributed cache is broken when JT staging dir is not on the default FS Key: MAPREDUCE-5278 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Fix For: 1-win Attachments: MAPREDUCE-5278.patch Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is set to point to HDFS, even though other file systems (e.g. Amazon S3 file system and Windows ASV file system) are the default file systems. For ASV, this config was chosen and there are a few reasons why: 1. To prevent leak of the storage account credentials to the user's storage account; 2. It uses HDFS for the transient job files what is good for two reasons – a) it does not flood the user's storage account with irrelevant data/files b) it leverages HDFS locality for small files However, this approach conflicts with how distributed cache caching works, completely negating the feature's functionality. When files are added to the distributed cache (thru files/achieves/libjars hadoop generic options), they are copied to the job tracker staging dir only if they reside on a file system different that the jobtracker's. Later on, this path is used as a key to cache the files locally on the tasktracker's machine, and avoid localization (download/unzip) of the distributed cache files if they are already localized. In this configuration the caching is completely disabled and we always end up copying dist cache files to the job tracker's staging dir first and localizing them on the task tracker machine second. This is especially not good for Oozie scenarios as Oozie uses dist cache to populate Hive/Pig jars throughout the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678901#comment-13678901 ] Ivan Mitic commented on MAPREDUCE-5278: --- Thanks Xi for posting the patch! +1 on the proposal, I have largely reviewed this already and tested it out E2E. A couple of additional comments below: 1. You’ll also have to provide a trunk compatible patch for the new functionality 2. TestMRWithDistributedCache#DistributedCacheCheckerJTStagingOnNondefaultFS: I would add the validation that localized dist cache entries are properly added to the classpath (below check). {code} // Check the class loaders LOG.info(Java Classpath: + System.getProperty(java.class.path)); ClassLoader cl = Thread.currentThread().getContextClassLoader(); // Both the file and the archive were added to classpath, so both // should be reachable via the class loader. TestCase.assertNotNull(cl.getResource(distributed.jar.inside2)); TestCase.assertNotNull(cl.getResource(distributed.jar.inside3)); TestCase.assertNull(cl.getResource(distributed.jar.inside4)); {code} It would be really good to get feedback on the approach from some more senior MR folks. Distributed cache is broken when JT staging dir is not on the default FS Key: MAPREDUCE-5278 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Fix For: 1-win Attachments: MAPREDUCE-5278.patch Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is set to point to HDFS, even though other file systems (e.g. Amazon S3 file system and Windows ASV file system) are the default file systems. For ASV, this config was chosen and there are a few reasons why: 1. To prevent leak of the storage account credentials to the user's storage account; 2. It uses HDFS for the transient job files what is good for two reasons – a) it does not flood the user's storage account with irrelevant data/files b) it leverages HDFS locality for small files However, this approach conflicts with how distributed cache caching works, completely negating the feature's functionality. When files are added to the distributed cache (thru files/achieves/libjars hadoop generic options), they are copied to the job tracker staging dir only if they reside on a file system different that the jobtracker's. Later on, this path is used as a key to cache the files locally on the tasktracker's machine, and avoid localization (download/unzip) of the distributed cache files if they are already localized. In this configuration the caching is completely disabled and we always end up copying dist cache files to the job tracker's staging dir first and localizing them on the task tracker machine second. This is especially not good for Oozie scenarios as Oozie uses dist cache to populate Hive/Pig jars throughout the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5277) Job history completed location cannot be on a file system other than default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic resolved MAPREDUCE-5277. --- Resolution: Duplicate Just realized that this is a dupe of MAPREDUCE-2351. Will reopen the original Jira and attach the branch-1 compatible backport patch. Job history completed location cannot be on a file system other than default Key: MAPREDUCE-5277 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5277 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1-win Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5277.branch-1-win.patch mapred.job.tracker.history.completed.location should be configurable to a location on any available file system. This can come handy for cases where HDFS is not the only file system in use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3540) saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)
[ https://issues.apache.org/jira/browse/MAPREDUCE-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668457#comment-13668457 ] Ivan Mitic commented on MAPREDUCE-3540: --- Hi Anoop, you no longer need to run from Cygwin shell to be able to compile and run Hadoop on Windows (this is post HADOOP-8562 merge). Check BUILDING.txt for instructions on how to compile natively on Windows. saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common) -- Key: MAPREDUCE-3540 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3540 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.24.0, trunk Reporter: Alejandro Abdelnur Fix For: 0.24.0 Attachments: MAPREDUCE-3540-121001.patch, MAPREDUCE-3540.Nov12.patch, MAPREDUCE-3540.patch {code} [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (generate-version) on project hadoop-yarn-common: Comman d execution failed. Cannot run program scripts\saveVersion.sh (in directory C:\cygwin\home\tucu\src\hadoop\hadoop-mapreduce-proje ct\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, The system cannot find the file specified - [Help 1] [ERROR] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5277) Job history completed location cannot be on a file system other than default
Ivan Mitic created MAPREDUCE-5277: - Summary: Job history completed location cannot be on a file system other than default Key: MAPREDUCE-5277 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5277 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1-win Reporter: Ivan Mitic Assignee: Ivan Mitic mapred.job.tracker.history.completed.location should be configurable to a location on any available file system. This can come handy for cases where HDFS is not the only file system in use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5277) Job history completed location cannot be on a file system other than default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5277: -- Attachment: MAPREDUCE-5277.branch-1-win.patch Attaching the patch. Job history completed location cannot be on a file system other than default Key: MAPREDUCE-5277 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5277 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1-win Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5277.branch-1-win.patch mapred.job.tracker.history.completed.location should be configurable to a location on any available file system. This can come handy for cases where HDFS is not the only file system in use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667999#comment-13667999 ] Ivan Mitic commented on MAPREDUCE-5224: --- Thanks Xi for taking time to address all comments! Latest patch looks good to me, +1 bq. There is no need to use the default file system for the jobhistory. There is another (orthogonal) bug here. Job history completed location also assumes the default FS what is not correct. This should be a separate Jira. I filed a Jira on this: MAPREDUCE-5277 JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668000#comment-13668000 ] Ivan Mitic commented on MAPREDUCE-5224: --- PS. I verified that the new test passes on Linux and on Windows. JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665677#comment-13665677 ] Ivan Mitic commented on MAPREDUCE-5224: --- Thanks Xi, you're almost there! A few additional comments below. Once you address those, +1 from me 1. You'll have to impersonate the MR owner when you're querying for the systemDirFs (same routine as with defaultFs). Sorry, I missed this when I initially reviewed the patch. After: {code} if (defaultFs == null) { defaultFs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } {code} add the following: {code} if (systemDirFs == null) { systemDirFs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return FileSystem.get(sysDir.toUri(), conf); }}); } {code} Once you implement above you should be able to simplify getSystemDir() by assuming that systemDirFs is different than null. This will allow you to get rid of the IOException (#2 comment from my initial review). 2. Nit: JobTracker.java: It seems that a tab slipped in: {code} if (systemDirFs.exists(restartFile)) { systemDirFs.delete(tmpRestartFile, false); // delete the tmp file } else if (systemDirFs.exists(tmpRestartFile)) { {code} I also see some invalid indentation: {code} // disable recovery if this is a restart shouldRecover = false; {code} Please correct. 3. Please remove try/catch from TestJobTrackerWithNonDefaultFS#tearDown since it can possibly mask a problem. Instead you can add IOException to the throws clause of the method: {code} public void tearDown() throws IOException { {code} 4. TestJobTrackerwithNonDefaultFs#testSystemDir: No need for the try/catch block in the test, please remove. The test will fail if any of its asserts fail. 5. Can you also please change TestJobTrackerWithNonDefaultFS#MAPRED_SYS_DIR to the following: {code} private final String MAPRED_SYS_DIR = System.getProperty(test.build.data, /tmp) + /mapred/system; {code} Guideline is for all local test files to go under test.build.data folder. 6. Nit: TestJobTrackerwithNonDefaultFs: You can use assertTrue instead: {code} assertEquals(Check if the system dir exists , FileSystem.get(sysDirPathURL, conf).exists(sysDirPath), true); {code} Btw, when you’re using assertEquals, you should place the expected value as the first arg, and the test value as the second arg. For example, assertEquals(true, fs.get().exists()). JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662407#comment-13662407 ] Ivan Mitic commented on MAPREDUCE-5191: --- Thanks Hitesh! TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 3.0.0 Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662675#comment-13662675 ] Ivan Mitic commented on MAPREDUCE-5224: --- Thanks Xi for addressing the comments! bq. For getFilesystemName(), what does fs stand for in this context, default fs or systemDir's file system. I guess it denotes the latter one. Right? Right, I also see it as a systemDir. JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5259) TestTaskLog fails on Windows because of path separators missmatch
Ivan Mitic created MAPREDUCE-5259: - Summary: TestTaskLog fails on Windows because of path separators missmatch Key: MAPREDUCE-5259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5259 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Test failure: {noformat} Running org.apache.hadoop.mapred.TestTaskLog Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.516 sec FAILURE! testTaskLog(org.apache.hadoop.mapred.TestTaskLog) Time elapsed: 409 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestTaskLog.testTaskLog(TestTaskLog.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5259) TestTaskLog fails on Windows because of path separators missmatch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5259: -- Attachment: MAPREDUCE-5259.patch Attaching the patch. The fix is to use File.separatorChar instead of the hardcoded Unix file separator ('/'). TestTaskLog fails on Windows because of path separators missmatch - Key: MAPREDUCE-5259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5259 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5259.patch Test failure: {noformat} Running org.apache.hadoop.mapred.TestTaskLog Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.516 sec FAILURE! testTaskLog(org.apache.hadoop.mapred.TestTaskLog) Time elapsed: 409 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestTaskLog.testTaskLog(TestTaskLog.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5259) TestTaskLog fails on Windows because of path separators missmatch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5259: -- Status: Patch Available (was: Open) TestTaskLog fails on Windows because of path separators missmatch - Key: MAPREDUCE-5259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5259 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5259.patch Test failure: {noformat} Running org.apache.hadoop.mapred.TestTaskLog Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.516 sec FAILURE! testTaskLog(org.apache.hadoop.mapred.TestTaskLog) Time elapsed: 409 sec FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.mapred.TestTaskLog.testTaskLog(TestTaskLog.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661425#comment-13661425 ] Ivan Mitic commented on MAPREDUCE-5191: --- bq. How about just creating a file under target/ with the name of the test as filename? Thanks Hitesh, totaly makes sense. Will attach the updated patch. TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5191: -- Attachment: MAPREDUCE-5191.3.patch TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5191: -- Status: Patch Available (was: Open) TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.3.patch, MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661272#comment-13661272 ] Ivan Mitic commented on MAPREDUCE-5224: --- Thanks Xi for the patch! I think this is close, have some comments below which should be easy to address. 1. JobTracker.java: Lines should not exceed 80 chars per Hadoop coding guidelines. {code} FSDataOutputStream out = FileSystem.create(systemDirFs, tmpRestartFile, filePerm); {code} Same comment for other changes in the patch. 2. I really don't think it is necessary to introduce IOException to so many methods and interfaces for the reasons in this Jira. I would change JobTracker#getSystemDir() to fallback to the default value and log a warning in case {{FileSystem.get()}} throws. 3. Should JobTracker#RecoveryManager#checkAndAddJob() use systemDirFs? 4. I would rename the {{JobTracker#fs}} local member to {{defaultFs}} to signify its meaning and avoid possible confusion in the future. I actually don’t think you need to keep both defaultFs and systemDirFs as members. The only other place where you need defaultFs is {{JobHistory#initDone}} and you should be able to query for it locally. 5. Let's rename the test to TestJobTrackerWithNonDefaultFS 6. What is the expected behavior for TestSysDirOnNonDefaultFS when your code changes are not applied? Looks like the setUp step is failing. I would prefer if we could have the test case fail instead. 7. TestSysDirOnNonDefaultFS.java: Please add a more verbose comment on the intent of the test. {code} /** * Class to test jobtracker's system dir */ {code} 8. TestSysDirOnNonDefaultFS.java: Why not let setUp throw the IOException() in case of an error? 9. TestSysDirOnNonDefaultFS.java: Please use JUnit assertEquals method to validate that the expected and the retrieved values are equal. 10. TestSysDirOnNonDefaultFS.java: Can we also add validation that mapred system dir is created in the right place by checking for its existence. 11. Would be good to understand if there are some changes needed to get the equivalent functionality in YARN. I would be fine with addressing this via a separate Jira. JobTracker should allow the system directory to be in non-default FS Key: MAPREDUCE-5224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch JobTracker today expects the system directory to be in the default file system if (fs == null) { fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() { public FileSystem run() throws IOException { return FileSystem.get(conf); }}); } ... public String getSystemDir() { Path sysDir = new Path(conf.get(mapred.system.dir, /tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); } In Cloud like Azure the default file system is set as ASV (Windows Azure Blob Storage), but we would still like the system directory to be in DFS. We should change JobTracker to allow that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5210) Job submission has strict permission validation
[ https://issues.apache.org/jira/browse/MAPREDUCE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651574#comment-13651574 ] Ivan Mitic commented on MAPREDUCE-5210: --- bq. FileSystem should have an API to check ownership. I like the idea of exposing a FileSystem API for checking the ownership, something like {{FileSystem#isOwnedByUser(String username…)}}. We had a problem with this check on Windows with many tests that use the local file system. Check out HADOOP-8457 to see what we did in branch-1-win. Just for completeness :), another, 3rd option is to have S3 implement setPermissions/setOwner FileSystem APIs. We ended up doing this with our Azure FileSystem implementation to be able to run MR on top of it. Job submission has strict permission validation --- Key: MAPREDUCE-5210 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5210 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amareshwari Sriramadasu Assignee: samar The following code in JobSubmissionFiles.java mandates strict permission on job submission : {noformat} if (fs.exists(stagingArea)) { FileStatus fsStatus = fs.getFileStatus(stagingArea); String owner = fsStatus.getOwner(); if (!(owner.equals(currentUser) || owner.equals(realUser))) { throw new IOException(The ownership on the staging directory + stagingArea + is not as expected. + It is owned by + owner + . The directory must + be owned by the submitter + currentUser + or + by + realUser); } {noformat} For file systems such as S3, which do not have permission concept, user can never submit a job with staging area in S3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646670#comment-13646670 ] Ivan Mitic commented on MAPREDUCE-50: - Hi Steve, Vinod, I've run into the similar problem to this one. In my case, JobTracker started failing jobs because the network topology resolution started failing for a single node in the cluster: {code} 2013-04-27 08:33:08,204 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:3205) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:550) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:734) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4214) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} What happens is that some input split blocks are located on the datanode with the same IP/hostname as the TT. As a side effect this results in many of the customer jobs to fail during initialization. NN on the other hand has a fallback logic that defaults to /default-rack, and this inconsistency actually makes this problem more severe :) {code} 2013-04-27 04:36:47,185 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The resolve call returned null! Using /default-rack for host [100.64.34.3] 2013-04-27 04:36:47,185 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/100.64.34.3:50010 {code} In terms of the fix, my proposal would be to add the same fallback logic to the JobTracker. In our case, we actually had a network topology script that worked fine for a year or so, and now started failing for a single node for a reason we cannot explain yet. Let me know what you think. I'll take up this Jira if you don't mind. NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.3 Reporter: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-50) NPE in heartbeat when the configured topology script doesn't exist
[ https://issues.apache.org/jira/browse/MAPREDUCE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic reassigned MAPREDUCE-50: --- Assignee: Ivan Mitic NPE in heartbeat when the configured topology script doesn't exist -- Key: MAPREDUCE-50 URL: https://issues.apache.org/jira/browse/MAPREDUCE-50 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.3 Reporter: Vinod Kumar Vavilapalli Assignee: Ivan Mitic -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644662#comment-13644662 ] Ivan Mitic commented on MAPREDUCE-5191: --- bq. Is the increased timeout meant to go on testQueue instead of test2Queue? That's absolutely right... I overlooked, thanks Chris! TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5191: -- Attachment: MAPREDUCE-5191.2.patch Attaching the updated patch. TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.2.patch, MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5177) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/MAPREDUCE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5177: -- Status: Patch Available (was: Open) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: MAPREDUCE-5177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5177 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5177.commonutils.2.patch, MAPREDUCE-5177.commonutils.patch Move to using common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5177) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/MAPREDUCE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5177: -- Attachment: MAPREDUCE-5177.commonutils.2.patch Attaching the updated patch. Should be good now :) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: MAPREDUCE-5177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5177 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5177.commonutils.2.patch, MAPREDUCE-5177.commonutils.patch Move to using common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
Ivan Mitic created MAPREDUCE-5191: - Summary: TestQueue#testQueue fails with timeout on Windows Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5191: -- Attachment: MAPREDUCE-5191.patch Attaching the patch. The fix is to increases the timeout from 5 to 10 seconds. First, I timed the call to {{File.createTempFile}} and it was ~5 seconds consistently on my box. After that, I looked this up online, and turned out to be a [known issue|http://stackoverflow.com/questions/2608763/why-does-first-call-to-java-io-file-createtempfilestring-string-file-take-5-se]. TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5191) TestQueue#testQueue fails with timeout on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5191: -- Status: Patch Available (was: Open) TestQueue#testQueue fails with timeout on Windows - Key: MAPREDUCE-5191 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5191 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5191.patch Test times out on my machine after 5 seconds always on the below stack: {code} testQueue(org.apache.hadoop.mapred.TestQueue) Time elapsed: 5009 sec ERROR! java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte(SeedGenerator.java:330) at sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(SeedGenerator.java:319) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:117) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at java.io.File.generateFile(File.java:1682) at java.io.File.createTempFile(File.java:1791) at java.io.File.createTempFile(File.java:1828) at org.apache.hadoop.mapred.TestQueue.writeFile(TestQueue.java:221) at org.apache.hadoop.mapred.TestQueue.testQueue(TestQueue.java:53) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5177) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/MAPREDUCE-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5177: -- Attachment: MAPREDUCE-5177.commonutils.patch Attaching the patch. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: MAPREDUCE-5177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5177 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5177.commonutils.patch Move to using common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5177) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
Ivan Mitic created MAPREDUCE-5177: - Summary: Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: MAPREDUCE-5177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5177 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Move to using common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631430#comment-13631430 ] Ivan Mitic commented on MAPREDUCE-5066: --- Thanks for the review Arun! Sounds good, let me prepare the updated patch. JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.2.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.4.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.branch-1-win.5.patch JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.2.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.4.patch, MAPREDUCE-5066.branch-1-win.5.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.3.patch Attaching updated patches. Arun, let me know it this looks good. Thanks JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.2.patch, MAPREDUCE-5066.3.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.4.patch, MAPREDUCE-5066.branch-1-win.5.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5056: -- Resolution: Not A Problem Status: Resolved (was: Patch Available) ProcfsBasedProcessTree was recently removed from the mapreduce project via MAPREDUCE-5077. Resolving this Jira as not a problem. Yarn's TestProcfsBasedProcessTree already passes on Windows. TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.2.patch, MAPREDUCE-5056.trunk.3.patch, MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622851#comment-13622851 ] Ivan Mitic commented on MAPREDUCE-5066: --- Arun, did you get a chance to take a look at my latest branch-1/branch-2 patches? Please check my comment from above from some context. Thx! JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.2.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.4.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622852#comment-13622852 ] Ivan Mitic commented on MAPREDUCE-5056: --- Bikas, can you please take a look at the latest patch? Thx! TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.2.patch, MAPREDUCE-5056.trunk.3.patch, MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622978#comment-13622978 ] Ivan Mitic commented on MAPREDUCE-5056: --- Thanks Chris for the comment. Bikas, this also makes the test consistent with the implementation {{ProcfsBasedProcessTree#getProcessTreeDump}}. TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.2.patch, MAPREDUCE-5056.trunk.3.patch, MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.patch MAPREDUCE-5066.branch-1.patch MAPREDUCE-5066.branch-1-win.3.patch Hi Arun, I am attaching branch-1-win/branch-1 and branch-2 compatible patches. A few notes on the patches: - Fixed a test verification issue in branch-1-win.3.patch - branch-1 and branch-1-win patches are fully compatible (and equivalent) - Branch-2 codebase changed significantly and I did my best effort to find the appropriate forward patch. There are two implementations of the JobEndNotifier, mapred#JobEndNotifier (based on the one from branch-1 but simplified) and mapreduce.v2.app#JobEndNotifier (new implementation). The former is used in the LocalJobRunner and latter in the MR AppMaster. In my patch I did the following: *a.* Applied the bugfixes to current state of mapred#JobEndNotifier and included the corresponding unittests *b.* Given that mapreduce.v2.app#JobEndNotifier already sets the timeout to 5 seconds, I did the same in mapred#JobEndNotifier. In other words, I did not introduce a config knob that would allow the timeout to be configurable. My reasoning was that in branch-1, people might see 5 second timeout as a regression and might want to change it to a different value. In trunk, given that the timeout is already set to 5 seconds, this should be fine until proved otherwise. Please advise if you think this is needed. JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.patch JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: (was: MAPREDUCE-5066.patch) JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Status: Patch Available (was: Open) JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 1-win, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: (was: MAPREDUCE-5066.branch-1.patch) JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.2.patch MAPREDUCE-5066.branch-1-win.4.patch Audit warning fix, missing the Apache header in the new test file. JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.2.patch, MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.4.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606136#comment-13606136 ] Ivan Mitic commented on MAPREDUCE-4987: --- bq. These tests were running pretty close to the timeouts in my environment, even on Mac. Here is a new patch that increases the timeouts. Thanks, I verified that the test now passes, +1 on the patch TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks --- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4987.1.patch, MAPREDUCE-4987.2.patch On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5078) TestMRAppMaster fails on Windows due to mismatched path separators
[ https://issues.apache.org/jira/browse/MAPREDUCE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605523#comment-13605523 ] Ivan Mitic commented on MAPREDUCE-5078: --- Thanks Chris, patch looks good, +1 TestMRAppMaster fails on Windows due to mismatched path separators -- Key: MAPREDUCE-5078 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5078 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-5078.1.patch The failing test is {{TestMRAppMaster#testMRAppMasterForDifferentUser}}. There is an assertion about the AM staging directory, but the expected value is constructed with a mix of forward and back slashes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605755#comment-13605755 ] Ivan Mitic commented on MAPREDUCE-4987: --- Thanks Chris, patch looks good overall, +1 I noticed that TestMRJobs fails with timeout on my box. Does it consistently succeed for you? I see that the timeouts are set quite high (5 minutes). This is non blocking, I'll take a look when I get a chance, just thought I'll ask. TestFileUtil passes fine. TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks --- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4987.1.patch On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605803#comment-13605803 ] Ivan Mitic commented on MAPREDUCE-4987: --- bq. Is there a particular test within the suite that is timing out consistently for you? I tried to remove all timeouts from the test and it is passing now. Let me find the exact test case that is causing problems. TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks --- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4987.1.patch On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605921#comment-13605921 ] Ivan Mitic commented on MAPREDUCE-4987: --- Hi Chris, I played around with the test a bit, and the following tests fail because of the timeout on my box: testRandomWriter, testFailingMapper, testSleepJobWithSecurityOn. TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks --- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4987.1.patch On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605988#comment-13605988 ] Ivan Mitic commented on MAPREDUCE-4987: --- bq. I played around with the test a bit, and the following tests fail because of the timeout on my box: testRandomWriter, testFailingMapper, testSleepJobWithSecurityOn. Increasing the test timeouts by the factor of 2 helped make the tests pass on my box. TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks --- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4987.1.patch On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.branch-1-win.patch Attaching the branch-1 compatible patch. A few notes: - Introduced missing unittests for the JobEndNotifier that cover most of its functionality - Added a test case that targets the problem from the Jira - Fixed a bug in how retry count it computed (we had an extra retry attempt previously) JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1-win.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13604764#comment-13604764 ] Ivan Mitic commented on MAPREDUCE-5066: --- bq. Job notification also exists in 2.x which may face the same set of issues. Thanks Hitesh, it should be strait forward to rebase the patch for 2.x branch. Will do so once the current patch is reviewed. JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1-win.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Status: Patch Available (was: Open) JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 1-win, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1-win.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Status: Open (was: Patch Available) JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha, 1-win, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1-win.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5066: -- Attachment: MAPREDUCE-5066.branch-1-win.2.patch Minor patch update, factoring common unittest code into utility methods. JobTracker should set a timeout when calling into job.end.notification.url -- Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5066.branch-1-win.2.patch, MAPREDUCE-5066.branch-1-win.patch In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4885) streaming tests have multiple failures on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602101#comment-13602101 ] Ivan Mitic commented on MAPREDUCE-4885: --- Thanks for addressing the comments Chris, +1, patch looks good to me streaming tests have multiple failures on Windows - Key: MAPREDUCE-4885 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4885 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming, test Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4885.1.patch, MAPREDUCE-4885.2.patch There are multiple test failures due to Queue configuration missing child queue names for root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
Ivan Mitic created MAPREDUCE-5066: - Summary: JobTracker should set a timeout when calling into job.end.notification.url Key: MAPREDUCE-5066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win, 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows). I propose we introduce a configurable timeout option and set a default to a reasonably small value. If we want, we can also introduce a configurable number of workers processing the notification queue (not sure if this is needed though at this point). I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4885) streaming tests have multiple failures on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600911#comment-13600911 ] Ivan Mitic commented on MAPREDUCE-4885: --- Patch looks really good Chris. Just one minor comment, you are missing Apache headers in the newly added cmd scripts, otherwise +1 from me streaming tests have multiple failures on Windows - Key: MAPREDUCE-4885 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4885 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming, test Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: MAPREDUCE-4885.1.patch There are multiple test failures due to Queue configuration missing child queue names for root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
Ivan Mitic created MAPREDUCE-5056: - Summary: TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5056: -- Attachment: MAPREDUCE-5056.trunk.patch Attaching the patch. An easy one, the test fails to match the process dump header because of a line ending (test expects: \n, and it is: \r\n). TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5056: -- Status: Patch Available (was: Open) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599045#comment-13599045 ] Ivan Mitic commented on MAPREDUCE-5056: --- Just to add, not sure how valuable it is to run TestProcfsBasedProcessTree on Windows, given that we actually use WindowsBasedProcessTree. Alternative is to skip the test as a whole if ProcfsBasedProcessTree is not available. TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5056: -- Attachment: MAPREDUCE-5056.trunk.2.patch Attaching a slightly better patch. TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.2.patch, MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599090#comment-13599090 ] Ivan Mitic commented on MAPREDUCE-5056: --- bq. I am in favor of disabling the test with if(Shell.Linux). Thanks Bikas for the quick response, I agree, will attach the new patch shortly. TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.2.patch, MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5056) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header
[ https://issues.apache.org/jira/browse/MAPREDUCE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-5056: -- Attachment: MAPREDUCE-5056.trunk.3.patch Attaching the updated patch. A few notes: - I kept the previous fix for newline to achieve symmetry with the implementation from ProcfsBasedProcessTree#getProcessTreeDump - I changed ProcfsBasedProcessTree#isAvailable to use the existing API to check if running on Linux (not to duplicate the code) TestProcfsBasedProcessTree fails on Windows with Process-tree dump doesn't start with a proper header - Key: MAPREDUCE-5056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5056 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-5056.trunk.2.patch, MAPREDUCE-5056.trunk.3.patch, MAPREDUCE-5056.trunk.patch Test fails on the below assertion: Running org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec FAILURE! testProcessTreeDump(org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree) Time elapsed: 0 sec FAILURE! junit.framework.AssertionFailedError: Process-tree dump doesn't start with a proper header at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.mapreduce.util.TestProcfsBasedProcessTree.testProcessTreeDump(TestProcfsBasedProcessTree.java:564) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574097#comment-13574097 ] Ivan Mitic commented on MAPREDUCE-4987: --- Thanks for reporting this Chris. bq. But symlinks CAN be used for functional purposes i.e linking to libraries etc. ? Hi Vinod. I believe we'll have to port the branch-1-win semantic to trunk to properly support symlinks on both Java6 and Java7 on Windows. Yes, symlinks can be created to point to folders and files, however, Java6 does not interpret them correctly. We've seen so many issues with symlinks on Java6, and the only option that worked fine (and was signed off on) is to do a file copy in case of Java6. HADOOP-9061 talks about some of these problems. bq. If so we can just do the platform check in the test-case. We also initially thought this would be fine (you can check thru branch-1-win history :)). However, the real problem comes when someone tries to access the symlink thru Java APIs. Examples of problems are, File#length on symlinks returns zero. This means that RLFS does not work on top of symlinks. Additionally, File#renameTo on symlink renames the target file instead of the symlink (really strange I know :)). Hope this helps TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks --- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: trunk-win Reporter: Chris Nauroth On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4396) Make LocalJobRunner work with private distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528577#comment-13528577 ] Ivan Mitic commented on MAPREDUCE-4396: --- This was fixed with HADOOP-8734 in branch-1-win. Maybe just integrate the same patch to branch-1? Make LocalJobRunner work with private distributed cache --- Key: MAPREDUCE-4396 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4396 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Priority: Minor Attachments: mapreduce-4396-branch-1.patch, test-afterpatch.result, test-beforepatch.result, test-patch.result Some LocalJobRunner related unit tests fails if user directory permission and/or umask is too restrictive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4768) yarn cmd line scripts for windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated MAPREDUCE-4768: -- Status: Patch Available (was: Open) yarn cmd line scripts for windows - Key: MAPREDUCE-4768 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4768 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: trunk-win Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: MAPREDUCE-4768.branch-trunk-win.scripts.patch Jira tracking addition of windows equivalents for yarn, yarn-config and yarn-env shell scripts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira