[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403670#comment-13403670 ] Ivan Mitic commented on MAPREDUCE-4322: --- FYI, I opened MAPREDUCE-4386 for better abstractions around different shells. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, > MAPREDUCE-4322-branch-1-win(5).patch, MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402576#comment-13402576 ] Bikas Saha commented on MAPREDUCE-4322: --- Thanks for including all comments! +1. lgtm. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, > MAPREDUCE-4322-branch-1-win(5).patch, MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402489#comment-13402489 ] Bikas Saha commented on MAPREDUCE-4322: --- Thats exactly what I am saying too :) The test is trying to cover both cases, but the result is kind of implicit right now because we know both paths are being covered. However, in the test itself by checking for only sb.toString() we are not making that explicit. There is nothing to hardcode. Unless I am reading the test code incorrectly, we have already defined List setup and List cmd. In the exception message, along with checking for sb.toString(), we could also check for setup[0] and cmd[0]. That way its explicit that 2 different paths are being covered. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, > MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402444#comment-13402444 ] Ivan Mitic commented on MAPREDUCE-4322: --- 3. Oh, thanks for clarifying. My thinking was, from the user's perspective, we are outputting the actual command that exceeded the limit. Whether it is setup or command, it is not as relevant. In unit tests, since I know the code, I want to cover all cases, so I'm testing both. I am leaning toward keeping the code as is, given that I wouldn't want to have a hardcoded dependency on what is in the exception message. Let me know if you feel strong about this. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, > MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402419#comment-13402419 ] Bikas Saha commented on MAPREDUCE-4322: --- 3. My main concern is that we are not differentiating that the first failure is due to a bad setup string while the second one is due to a bad cmd string. Since the code is adding the exact failed command into the exception we could look for "setup" in the first case and "command" in the second case in addition to sb.toString(). I should have been more clear. I didn't literally mean "setup.toString()" because its a list :) > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, > MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401895#comment-13401895 ] Ivan Mitic commented on MAPREDUCE-4322: --- Thanks Bikas! 1. Agree, fixed 2. Actually, we cannot do that, as {{TaskLog.MAX_CMD_LINE_LENGTH}} is {{MAX_INT}} on non-Windows platforms. 3. Hmm, I want sb.toString(). Basically, I want to verify that the long command is part of the exception message. {{setup.toString()}} wouldn't work as there would be multiple lines in the command, and the exception message only contains the first problematic one. Make sense? > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401890#comment-13401890 ] Bikas Saha commented on MAPREDUCE-4322: --- 1) TaskLog.java Wouldnt renaming these from bash* to shell* be better. Since these are private members the renaming would not cause much grief upon merging the code. {code} private static final String bashCommand = (Shell.WINDOWS)? "cmd": "bash"; private static final String bashCommandSufix = (Shell.WINDOWS) ? "/c" : "-c"; private static final String bashCommandNullOutput = (Shell.WINDOWS) ? "< nul" : "< /dev/null"; {code} 2) TestTaskLog.java Could you please replace 8192 with TaskLog.MAX_CMD_LINE_LENGTH {code} for (int i = 0; i < 8192; ++i) { {code} 3) TestTaskLog.java For the 2 places you really mean "setup.toString()" and "cmd.toString()" instead of sb.toString() right? {code} assertTrue(ex.getMessage().contains(sb.toString())); {code} The current refactoring makes it cleaner. Agree on separate jira for a better implementation. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401864#comment-13401864 ] John Gordon commented on MAPREDUCE-4322: +1 this fixed a lot of downstream tests in pig and hive on Windows. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401821#comment-13401821 ] Ivan Mitic commented on MAPREDUCE-4322: --- Thanks for your feedback Bikas! 1. Fixed 2. I added a check to verify that the problematic command is in the output exception message 3. Fixed 4. Fixed 5. I did some work to remove many of Shell.WINDOWS forks. We might be able to further improve on this by exposing some of this functionality from Shell.java. Although, this is would be a separate patch. Thoughts? > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401709#comment-13401709 ] Bikas Saha commented on MAPREDUCE-4322: --- 1)TaskLog.java {code} if (s.length() > MAX_CMD_LINE_LENGTH) { throw new IOException("Command line length exceeds the OS limit " + MAX_CMD_LINE_LENGTH); } {code} Can you add something to the exception message about the actual command that is bad. It will help in debugging which command is bad and also help in the next comment. 2)TestTaskLog.java In the test, on the face of it, there seems to be no difference in the 2 times that captureOutAndError is called. Verifying that setup failed in the first case and cmd failed in the second case will help differentiate. 3)TestTaskLog.java Would be good to actually use TaskLog.MAX_CMD_LINE_LENGTH so that if we change it then the test captures that. 4)TestTaskLog.java Why not directly call buildCommandLine() - the function we are actually testing instead of captureOutAndError()? buildCommandLine() should be visible in the test because it would be in the same package. 5)Would it be possible to refactor TaskLog.buildCommandLine() to reduce the number of Shell.WINDOWS forks? It is getting hard to understand and error prone. e.g. the following code adds a new command line (exec setsid) to the script but its length would get included with the length of the actual cmd in the last check for MAX_CMD_LINE_LENGTH. Thats happens in Linux and it does not matter but it makes the code readability hard and incorrect. {code} if (tailLength > 0) { mergedCmd.append("("); } else if (ProcessTree.isSetsidAvailable && useSetSid && !Shell.WINDOWS) { mergedCmd.append("exec setsid "); // <=== this is a new command line } else { if (!Shell.WINDOWS) mergedCmd.append("exec "); } // ... // add real cmd line // ... if (mergedCmd.length() - prevLength > MAX_CMD_LINE_LENGTH) { throw new IOException("Command line length exceeds the OS limit " + MAX_CMD_LINE_LENGTH); } {code} > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win(2).patch, > MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293058#comment-13293058 ] Bikas Saha commented on MAPREDUCE-4322: --- Could you please add a test that verifies long command lines and the other checks you have added? > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293016#comment-13293016 ] Bikas Saha commented on MAPREDUCE-4322: --- Sounds good. LGTM. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293011#comment-13293011 ] Ivan Mitic commented on MAPREDUCE-4322: --- bq. TaskLog.java - Any special reasons to perform the command line length check multiple times instead of once at the end of buildCommandLine()? There are multiple lines that we want to execute as part of the taskjvm.cmd, and I am checking the length of every line. Example taskjvm.cmd is the following: {code} set HADOOP_CLIENT_OPTS=... set SHELL="cmd"... ... set CLASSPATH=... C:\...\jre\bin\java ... {code} bq. the advantage with the -classpath argument was isolation of the classpath to the specific spawned JVM. But by changing the classpath env var we risk changing it for every spawned process too. Maybe thats not much of a problem. I thought of this as well. As we are starting a separate bash/cmd for every task, this will only apply to that task. bq. What if CLASSPATH is already set on the machine? Will this append to it or override it? From the code it looks like generating the classpath list will pick up the parent classpath. So if CLASSPATH env var is already set then it will be part of classpath list via the parent jvm (TaskTracket jvm). So even if the taskjvm.cmd sets the CLASSPATH it will be a superset of any existing CLASSPATH env var. Can you please verify this by having a pre-existing CLASSPATH set? Thanks, I just checked, and we do not include the system level CLASSPATH. However, the setting itself seems to be exclusive, if you pass classpath via {{-classpath}}, the CLASSPATH environment variable is ignored. Just tested this out with a sample app that prints {{System.getProperty("java.class.path")}}. It generally makes sense to be specific in this case, and not to include the system setting as this can generally cause problems with resolution. Also, there are ways Hadoop users can specify custom classpaths if needed. Agree? > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292072#comment-13292072 ] Bikas Saha commented on MAPREDUCE-4322: --- Sorry. I forgot the review comments themselves. 1) TaskLog.java - Any special reasons to perform the command line length check multiple times instead of once at the end of buildCommandLine()? 2) the advantage with the -classpath argument was isolation of the classpath to the specific spawned JVM. But by changing the classpath env var we risk changing it for every spawned process too. Maybe thats not much of a problem. 3) What if CLASSPATH is already set on the machine? Will this append to it or override it? From the code it looks like generating the classpath list will pick up the parent classpath. So if CLASSPATH env var is already set then it will be part of classpath list via the parent jvm (TaskTracket jvm). So even if the taskjvm.cmd sets the CLASSPATH it will be a superset of any existing CLASSPATH env var. Can you please verify this by having a pre-existing CLASSPATH set? > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292033#comment-13292033 ] Ivan Mitic commented on MAPREDUCE-4322: --- bq. Has this been run on a Linux/Unix platform to make sure things are not broken? Thanks for reviewing the patch Bikas. Yes, I've done a test run on Linux before posting a patch. > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292029#comment-13292029 ] Bikas Saha commented on MAPREDUCE-4322: --- Has this been run on a Linux/Unix platform to make sure things are not broken? > Fix command-line length abort issues on Windows > --- > > Key: MAPREDUCE-4322 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Windows, downstream applications with long aggregate > classpaths >Reporter: John Gordon >Assignee: Ivan Mitic > Attachments: MAPREDUCE-4322-branch-1-win.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > When a task is started on the tasktracker, it creates a small batch file to > invoke java and runs that batch. Within the batch file, the invocation of > Java currently has -classpath ${CLASSPATH} inline to the command. That line > often exceeds 8000 characters. This is ok for most linux distributions > because the line limit env variable is often set much higher than this. > However, for Windows this cause cmd to abort execution. This surfaces in > Hadoop as an unknown failure mode for the task. > I think the easiest and most natural way to fix this is to push the > -classpath option into a config file to take the longest variable part of the > line and put it somewhere that scales better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira