[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-28 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403670#comment-13403670
 ] 

Ivan Mitic commented on MAPREDUCE-4322:
---

FYI, I opened MAPREDUCE-4386 for better abstractions around different shells.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, 
 MAPREDUCE-4322-branch-1-win(5).patch, MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402419#comment-13402419
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

3. My main concern is that we are not differentiating that the first failure is 
due to a bad setup string while the second one is due to a bad cmd string. 
Since the code is adding the exact failed command into the exception we could 
look for setup in the first case and command in the second case in addition 
to sb.toString(). I should have been more clear. I didn't literally mean 
setup.toString() because its a list :)

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, 
 MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-27 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402444#comment-13402444
 ] 

Ivan Mitic commented on MAPREDUCE-4322:
---

3. Oh, thanks for clarifying. My thinking was, from the user's perspective, we 
are outputting the actual command that exceeded the limit. Whether it is setup 
or command, it is not as relevant. In unit tests, since I know the code, I want 
to cover all cases, so I'm testing both. I am leaning toward keeping the code 
as is, given that I wouldn't want to have a hardcoded dependency on what is in 
the exception message. Let me know if you feel strong about this.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, 
 MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402489#comment-13402489
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

Thats exactly what I am saying too :) The test is trying to cover both cases, 
but the result is kind of implicit right now because we know both paths are 
being covered. However, in the test itself by checking for only sb.toString() 
we are not making that explicit. There is nothing to hardcode. Unless I am 
reading the test code incorrectly, we have already defined Liststring setup 
and Liststring cmd. In the exception message, along with checking for 
sb.toString(), we could also check for setup[0] and cmd[0]. That way its 
explicit that 2 different paths are being covered.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, 
 MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402576#comment-13402576
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

Thanks for including all comments! +1. lgtm.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win(4).patch, 
 MAPREDUCE-4322-branch-1-win(5).patch, MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401709#comment-13401709
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

1)TaskLog.java
{code}
if (s.length()  MAX_CMD_LINE_LENGTH) {
   throw new IOException(Command line length exceeds the OS limit  +
 MAX_CMD_LINE_LENGTH);
}
{code}
Can you add something to the exception message about the actual command that is 
bad. It will help in debugging which command is bad and also help in the next 
comment.

2)TestTaskLog.java
In the test, on the face of it, there seems to be no difference in the 2 times 
that captureOutAndError is called. Verifying that setup failed in the first 
case and cmd failed in the second case will help differentiate.

3)TestTaskLog.java
Would be good to actually use TaskLog.MAX_CMD_LINE_LENGTH so that if we change 
it then the test captures that.

4)TestTaskLog.java
Why not directly call buildCommandLine() - the function we are actually testing 
instead of captureOutAndError()? buildCommandLine() should be visible in the 
test because it would be in the same package.

5)Would it be possible to refactor TaskLog.buildCommandLine() to reduce the 
number of Shell.WINDOWS forks? It is getting hard to understand and error 
prone. e.g. the following code adds a new command line (exec setsid) to the 
script but its length would get included with the length of the actual cmd in 
the last check for MAX_CMD_LINE_LENGTH. Thats happens in Linux and it does not 
matter but it makes the code readability hard and incorrect.
{code}
if (tailLength  0) {
  mergedCmd.append(();
} else if (ProcessTree.isSetsidAvailable  useSetSid 
 !Shell.WINDOWS) {
  mergedCmd.append(exec setsid ); // === this is a new command line
} else {
  if (!Shell.WINDOWS)
mergedCmd.append(exec );
}
// ...
// add real cmd line
// ...
if (mergedCmd.length() - prevLength  MAX_CMD_LINE_LENGTH) {
  throw new IOException(Command line length exceeds the OS limit 
+ MAX_CMD_LINE_LENGTH);
}
{code}


 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-26 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401821#comment-13401821
 ] 

Ivan Mitic commented on MAPREDUCE-4322:
---

Thanks for your feedback Bikas!

1. Fixed
2. I added a check to verify that the problematic command is in the output 
exception message
3. Fixed
4. Fixed
5. I did some work to remove many of Shell.WINDOWS forks. We might be able to 
further improve on this by exposing some of this functionality from Shell.java. 
Although, this is would be a separate patch. Thoughts?

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-26 Thread John Gordon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401864#comment-13401864
 ] 

John Gordon commented on MAPREDUCE-4322:


+1 this fixed a lot of downstream tests in pig and hive on Windows.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401890#comment-13401890
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

1) TaskLog.java
Wouldnt renaming these from bash* to shell* be better. Since these are private 
members the renaming would not cause much grief upon merging the code.
{code}
  private static final String bashCommand = (Shell.WINDOWS)? cmd: bash;
  private static final String bashCommandSufix = (Shell.WINDOWS) ? /c : -c;
  private static final String bashCommandNullOutput = 
  (Shell.WINDOWS) ?  nul :  /dev/null;
{code}
2) TestTaskLog.java
Could you please replace 8192 with TaskLog.MAX_CMD_LINE_LENGTH
{code}
for (int i = 0; i  8192; ++i) {
{code}
3) TestTaskLog.java
For the 2 places you really mean setup.toString() and cmd.toString() 
instead of sb.toString() right?
{code}
assertTrue(ex.getMessage().contains(sb.toString()));
{code}

The current refactoring makes it cleaner. Agree on separate jira for a better 
implementation.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-26 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401895#comment-13401895
 ] 

Ivan Mitic commented on MAPREDUCE-4322:
---

Thanks Bikas!

1. Agree, fixed
2. Actually, we cannot do that, as {{TaskLog.MAX_CMD_LINE_LENGTH}} is 
{{MAX_INT}} on non-Windows platforms.
3. Hmm, I want sb.toString(). Basically, I want to verify that the long command 
is part of the exception message. {{setup.toString()}} wouldn't work as there 
would be multiple lines in the command, and the exception message only contains 
the first problematic one. Make sense?

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win(2).patch, 
 MAPREDUCE-4322-branch-1-win(3).patch, MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-11 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293011#comment-13293011
 ] 

Ivan Mitic commented on MAPREDUCE-4322:
---

bq. TaskLog.java - Any special reasons to perform the command line length check 
multiple times instead of once at the end of buildCommandLine()?
There are multiple lines that we want to execute as part of the taskjvm.cmd, 
and I am checking the length of every line. Example taskjvm.cmd is the 
following:
{code}
set HADOOP_CLIENT_OPTS=...
set SHELL=cmd...
...
set CLASSPATH=...
C:\...\jre\bin\java ...
{code}

bq. the advantage with the -classpath argument was isolation of the classpath 
to the specific spawned JVM. But by changing the classpath env var we risk 
changing it for every spawned process too. Maybe thats not much of a problem.
I thought of this as well. As we are starting a separate bash/cmd for every 
task, this will only apply to that task.

bq. What if CLASSPATH is already set on the machine? Will this append to it or 
override it? From the code it looks like generating the classpath list will 
pick up the parent classpath. So if CLASSPATH env var is already set then it 
will be part of classpath list via the parent jvm (TaskTracket jvm). So even if 
the taskjvm.cmd sets the CLASSPATH it will be a superset of any existing 
CLASSPATH env var. Can you please verify this by having a pre-existing 
CLASSPATH set?
Thanks, I just checked, and we do not include the system level CLASSPATH. 
However, the setting itself seems to be exclusive, if you pass classpath via 
{{-classpath}}, the CLASSPATH environment variable is ignored. Just tested this 
out with a sample app that prints {{System.getProperty(java.class.path)}}. It 
generally makes sense to be specific in this case, and not to include the 
system setting as this can generally cause problems with resolution. Also, 
there are ways Hadoop users can specify custom classpaths if needed. Agree?

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293016#comment-13293016
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

Sounds good. LGTM.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-11 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293058#comment-13293058
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

Could you please add a test that verifies long command lines and the other 
checks you have added?

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-08 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292029#comment-13292029
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

Has this been run on a Linux/Unix platform to make sure things are not broken?

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-08 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292033#comment-13292033
 ] 

Ivan Mitic commented on MAPREDUCE-4322:
---

bq. Has this been run on a Linux/Unix platform to make sure things are not 
broken?
Thanks for reviewing the patch Bikas. Yes, I've done a test run on Linux before 
posting a patch.

 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4322) Fix command-line length abort issues on Windows

2012-06-08 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292072#comment-13292072
 ] 

Bikas Saha commented on MAPREDUCE-4322:
---

Sorry. I forgot the review comments themselves.
1) TaskLog.java - Any special reasons to perform the command line length check 
multiple times instead of once at the end of buildCommandLine()?
2) the advantage with the -classpath argument was isolation of the classpath to 
the specific spawned JVM. But by changing the classpath env var we risk 
changing it for every spawned process too. Maybe thats not much of a problem.
3) What if CLASSPATH is already set on the machine? Will this append to it or 
override it? From the code it looks like generating the classpath list will 
pick up the parent classpath. So if CLASSPATH env var is already set then it 
will be part of classpath list via the parent jvm (TaskTracket jvm). So even if 
the taskjvm.cmd sets the CLASSPATH it will be a superset of any existing 
CLASSPATH env var. Can you please verify this by having a pre-existing 
CLASSPATH set?


 Fix command-line length abort issues on Windows
 ---

 Key: MAPREDUCE-4322
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4322
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
 Environment: Windows, downstream applications with long aggregate 
 classpaths
Reporter: John Gordon
Assignee: Ivan Mitic
 Attachments: MAPREDUCE-4322-branch-1-win.patch

   Original Estimate: 12h
  Remaining Estimate: 12h

 When a task is started on the tasktracker, it creates a small batch file to 
 invoke java and runs that batch.  Within the batch file, the invocation of 
 Java currently has -classpath ${CLASSPATH} inline to the command.  That line 
 often exceeds 8000 characters.  This is ok for most linux distributions 
 because the line limit env variable is often set much higher than this.  
 However, for Windows this cause cmd to abort execution.  This surfaces in 
 Hadoop as an unknown failure mode for the task.
 I think the easiest and most natural way to fix this is to push the 
 -classpath option into a config file to take the longest variable part of the 
 line and put it somewhere that scales better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira