[jira] [Commented] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2013-10-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793222#comment-13793222
 ] 

Hadoop QA commented on MAPREDUCE-4490:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607976/MAPREDUCE-4490.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4113//console

This message is automatically generated.

> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> -
>
> Key: MAPREDUCE-4490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller, tasktracker
>Affects Versions: 0.20.205.0, 1.0.3, 1.2.1
>Reporter: George Datskos
>Assignee: sam liu
>Priority: Critical
>  Labels: patch
> Fix For: 1.2.1
>
> Attachments: MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
> at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
> at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
> at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
> at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2013-10-11 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
---

Fix Version/s: 1.2.1
   Labels: patch  (was: )
 Target Version/s: 1.2.1
Affects Version/s: 1.2.1
   Status: Patch Available  (was: Open)

As above comments/description, the root cause of this issue is that userlogs 
directories are created by the task-controller binary which only runs once per 
JVM when using LinuxTaskController. So the major purpose of the patch is to add 
a new command to task-controller initialize task to create attempt directories 
and invoke it, with ShellCommandExecutor, in the 
LinuxTaskController#createLogDir method. Below are the main details of the 
modifications:
1. src/c++/task-controller/impl/task-controller.h: 
Add declaration to new method initialize_task()
2. src/c++/task-controller/impl/task-controller.c:
Implement the new method initialize_task() which invokes existing method 
create_attempt_directories()
3. src/c++/task-controller/impl/main.c:
To allow to invoke new method initialize_task() from ShellCommandExecutor
4. src/mapred/org/apache/hadoop/mapred/LinuxTaskController.java:
In method createLogDir() to invoke initialize_task() from ShellCommandExecutor 
to create attempt directory before launching each task

> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> -
>
> Key: MAPREDUCE-4490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller, tasktracker
>Affects Versions: 1.2.1, 1.0.3, 0.20.205.0
>Reporter: George Datskos
>Assignee: sam liu
>Priority: Critical
>  Labels: patch
> Fix For: 1.2.1
>
> Attachments: MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
> at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
> at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
> at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
> at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2013-10-11 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
---

Priority: Critical  (was: Major)

> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> -
>
> Key: MAPREDUCE-4490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller, tasktracker
>Affects Versions: 0.20.205.0, 1.0.3
>Reporter: George Datskos
>Assignee: sam liu
>Priority: Critical
> Attachments: MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
> at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
> at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
> at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
> at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5580) OutOfMemoryError in ReduceTask shuffleInMemory

2013-10-11 Thread Kevin Beyer (JIRA)
Kevin Beyer created MAPREDUCE-5580:
--

 Summary: OutOfMemoryError in ReduceTask shuffleInMemory
 Key: MAPREDUCE-5580
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5580
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.20.2
Reporter: Kevin Beyer


I have had several reduce tasks fail during the shuffle phase with the 
following error and stack trace (on CHD 4.1.2):

Error: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1644)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1504)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1339)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1271)

I found many web posts that report the same problem and a prior hadoop issue 
that is already fixed (that one involved a int overflow problem). 

The task had 1 GB of java heap and the mapred.job.shuffle.input.buffer.percent 
parameter in mapred-site.xml was set to the default of 0.7.  This mean that 1 
GB * 0.7 = 717 MB of java heap will hold the map outputs that are no bigger 
than 717 / 4 = 179 MB.

We were able to capture a heap dump of one reduce task.  The heap contained 8 
byte arrays that were 127 MB each.  These byte arrays were all referenced by 
their own DataInputBuffer.  Six of the buffers were referenced by the linked 
lists in ReduceTask$ReduceCopier.mapOutputsFilesInMemory.  These six byte 
arrays consume 127 MB * 6 = 762 MB of the heap.  Curiously, this 762 MB exceeds 
the 717 MB limit.  The ShuffleRamManager.fullSize = 797966777 = 761MB, so 
something is a bit off in my original value of 717...  But this is not the 
major source of trouble.

There are two more large byte arrays of 127 MB * 2 = 254 MB that are still in 
memory.  These are referenced from DataInputBuffers that are referenced 
indirectly by the static Merger.MergeQueue instance.  

One of these is referenced twice by the 'key' and 'value' fields of the 
MergeQueue.  These fields store the current minimum key and value by pointing 
at the full byte array of the map output and a range of a few bytes in that 
array.  These fields are needed during the active merge process, but not needed 
when the merge is complete.  In my heap dump, the 'segments' list has been 
cleared, so no active merge is in progress.  However, the 'key' and 'value' are 
still set from the last merge pass.  This pins one in-memory map output in 
memory, which can be as big as 0.7 / 4 = 17.5% of memory with default settings. 
 When a merge phase is complete, these two fields should be set null.

The second byte array is referenced via the MergeQueue.comparator 
RawComparator.  In my case, this is a WritableComparator. This is most likely 
caused by this method:

  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
try {
  buffer.reset(b1, s1, l1);   // parse key1
  key1.readFields(buffer);
  
  buffer.reset(b2, s2, l2);   // parse key2
  key2.readFields(buffer);
  
} catch (IOException e) {
  throw new RuntimeException(e);
}

return compare(key1, key2);   // compare them
  }

This causes the comparator to remember the last 'b2' byte array passed into 
compare().  This byte array could be an in-memory map output, which by default 
is 0.7/4 = 17.5% of memory.  This code could have a finally { buffer.clear() } 
to drop the reference.  Alternatively, the API could include a reset() call to 
clear such unnecessary state.

Given this information, we can see why we can easily cause an OOM error:  By 
default we have 70% of ram dedicated to map output, and we can have 17.5 * 2 = 
35% of memory unaccounted for by the two referenced described.  Even without 
accounting for any other memory overhead, we already have 70% + 35% = 105% of 
ram occupied in the unlucky case that these two references are pointing at the 
largest possible in-memory map outputs.

There may be other leakage of these byte arrays, but these were all the large 
byte arrays in my heap dump.  A test that makes many map outputs that are 0.7 / 
4 = 17.5% of the reduce task heap can reliably recreate this problem and 
perhaps find other unaccounted large byte arrays.




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Moved] (MAPREDUCE-5579) Improve JobTracker web UI

2013-10-11 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers moved HADOOP-10038 to MAPREDUCE-5579:


Affects Version/s: (was: 1.2.2)
   1.2.2
  Key: MAPREDUCE-5579  (was: HADOOP-10038)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Improve JobTracker web UI
> -
>
> Key: MAPREDUCE-5579
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5579
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.2.2
>Reporter: David Chen
> Attachments: jobdetails.png, jobtasks.png, jobtracker.png
>
>
> Users will often need to use the JobTracker web UI to debug or tune their 
> jobs in addition to checking the status of their jobs. The current web UI is 
> cumbersome to navigate. The goal is to make the JobTracker web UI easier to 
> navigate and present the data in a cleaner and more intuitive format.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests

2013-10-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793184#comment-13793184
 ] 

Hadoop QA commented on MAPREDUCE-3860:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608100/MAPREDUCE-3860.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 28 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1525 javac 
compiler warnings (more than the trunk's current 1524 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 16 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-rumen 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.mapred.TestClusterMapReduceTestCase

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-rumen 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.mapreduce.v2.TestUberAM
org.apache.hadoop.conf.TestNoDefaultsJobConf

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-tools/hadoop-rumen 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4112//console

This message is automatically generated.

> [Rumen] Bring back the removed Rumen unit tests
> ---
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz
>
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit teststoo

2013-10-11 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-3860:
---

Target Version/s: 3.0.0, 2.3.0
  Status: Patch Available  (was: Open)

> [Rumen] Bring back the removed Rumen unit teststoo
> --
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz
>
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit tests

2013-10-11 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-3860:
---

Summary: [Rumen] Bring back the removed Rumen unit tests  (was: [Rumen] 
Bring back the removed Rumen unit teststoo)

> [Rumen] Bring back the removed Rumen unit tests
> ---
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz
>
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit teststoo

2013-10-11 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-3860:
---

Attachment: MAPREDUCE-3860.patch
rumen-test-data.tar.gz

Attaching a patch and a tarball with gzip'ped test data. The robot wouldn't be 
able to run tests.

> [Rumen] Bring back the removed Rumen unit teststoo
> --
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
> Attachments: MAPREDUCE-3860.patch, rumen-test-data.tar.gz
>
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5387) Implement Signal.TERM on Windows

2013-10-11 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793117#comment-13793117
 ] 

Andrey Klochkov commented on MAPREDUCE-5387:


Indeed, [YARN-445] is related. Thanks to [~cnauroth] for pointing. I think I 
can put up a patch which sends Ctrl+C to all processes in the job object and 
make Yarn use it as an analog to TERM signal when running on Windows. That 
would be similar to how it's done with Ctrl+Break in [YARN-445].

> Implement Signal.TERM on Windows
> 
>
> Key: MAPREDUCE-5387
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5387
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1-win, 2.1.0-beta
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
>
> Signal.TERM is currently not supported by Hadoop on the Windows platform. 
> Tracking Jira for the problem. 
> A couple of things to keep in mind:
>  - Support for process groups (JobObjects on Windows)
>  - Solution should work for both java and other streaming Hadoop apps



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-10-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793046#comment-13793046
 ] 

Sandy Ryza commented on MAPREDUCE-5517:
---

The patch looks good to me other than a few stylistic nits:
{code}
+|| (numReduceTasks == 0 && conf.getLong(MRJobConfig.MAP_MEMORY_MB, 
0) <= sysMemSizeForUberSlot));
{code}
This line looks like it's over 80 characters.

{code}
-
+
{code}
False whitespace change

{code}
+
+// enable uber mode of 0 reducer no matter how much memory we assign to 
reducer
+conf = new Configuration();
+   conf.setBoolean(MRJobConfig.JOB_UBERTASK_ENABLE, true);  
+   conf.setInt(MRJobConfig.NUM_REDUCES, 0);   //actual num of 
reducer set to 0   
+   conf.setInt(MRJobConfig.REDUCE_MEMORY_MB, 2048);   
//mapreduce.reduce.memory.mb set to 2048 MB which is larger than 
yarn.app.mapreduce.am.resource.mb(1536 MB by default)  
+   isUber = testUberDecision(conf);
+   Assert.assertTrue(isUber);
{code}
Spaces should be used instead of tabs

> enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
> to be less than yarn.app.mapreduce.am.resource.mb
> -
>
> Key: MAPREDUCE-5517
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Priority: Minor
> Attachments: MAPREDUCE_5517_v3.patch.txt
>
>
> Since there is no reducer, the memory allocated to reducer is irrelevant to 
> enable uber mode of a job



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (MAPREDUCE-3860) [Rumen] Bring back the removed Rumen unit teststoo

2013-10-11 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov reassigned MAPREDUCE-3860:
--

Assignee: Andrey Klochkov

> [Rumen] Bring back the removed Rumen unit teststoo
> --
>
> Key: MAPREDUCE-3860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Andrey Klochkov
>
> MAPREDUCE-3582 did not move some of the Rumen unit tests to the new folder 
> and then MAPREDUCE-3705 deleted those unit tests. These Rumen unit tests need 
> to be brought back:
> TestZombieJob.java
> TestRumenJobTraces.java
> TestRumenFolder.java
> TestRumenAnonymization.java
> TestParsedLine.java
> TestConcurrentRead.java



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-3859) CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs

2013-10-11 Thread Mike Roark (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792827#comment-13792827
 ] 

Mike Roark commented on MAPREDUCE-3859:
---

Correction, still an issue in CDH4.2. Fix is the as Sergey's comment for 4.1.2: 
https://issues.apache.org/jira/browse/MAPREDUCE-3859?focusedCommentId=13659278&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13659278

> CapacityScheduler incorrectly utilizes extra-resources of queue for 
> high-memory jobs
> 
>
> Key: MAPREDUCE-3859
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3859
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: capacity-sched
>Affects Versions: 1.0.0
>Reporter: Sergey Tryuber
>Assignee: Sergey Tryuber
> Fix For: 1.2.1
>
> Attachments: MAPREDUCE-3859_MR1_fix_and_test.patch.txt, 
> test-to-fail.patch.txt
>
>
> Imagine, we have a queue A with capacity 10 slots and 20 as extra-capacity, 
> jobs which use 3 map slots will never consume more than 9 slots, regardless 
> how many free slots on a cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792828#comment-13792828
 ] 

Sangjin Lee commented on MAPREDUCE-5186:


Raising the priority. The default value of mapreduce.job.max.split.locations 
effectively renders CombineFileInputFormat DOA on any decent sized clusters. 
Have others encountered this issue?

> mapreduce.job.max.split.locations causes some splits created by 
> CombineFileInputFormat to fail
> --
>
> Key: MAPREDUCE-5186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Priority: Critical
>
> CombineFileInputFormat can easily create splits that can come from many 
> different locations (during the last pass of creating "global" splits). 
> However, we observe that this often runs afoul of the 
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any 
> decent size cluster, CombineFileInputFormat creates splits that are well 
> above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-11 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated MAPREDUCE-5186:
---

Priority: Critical  (was: Major)

> mapreduce.job.max.split.locations causes some splits created by 
> CombineFileInputFormat to fail
> --
>
> Key: MAPREDUCE-5186
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Priority: Critical
>
> CombineFileInputFormat can easily create splits that can come from many 
> different locations (during the last pass of creating "global" splits). 
> However, we observe that this often runs afoul of the 
> mapreduce.job.max.split.locations check that's done by JobSplitWriter.
> The default value for mapreduce.job.max.split.locations is 10, and with any 
> decent size cluster, CombineFileInputFormat creates splits that are well 
> above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5541) Improved algorithm for whether need speculative task

2013-10-11 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792786#comment-13792786
 ] 

Benoy Antony commented on MAPREDUCE-5541:
-

John , 

1. Could you please make these parameters { SPECULATIVE_PROGRESS , 
SPECULATIVE_FACTOR } configurable  ?
2. Could you please share some test results indicating the improvement ?




> Improved algorithm for whether need speculative task
> 
>
> Key: MAPREDUCE-5541
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5541
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 1.2.1
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 1.2.2
>
> Attachments: MAPREDUCE-5541-branch-1.2.patch
>
>
> Most of time, tasks won't start running at same time.
> In this case hasSpeculativeTask in TaskInProgress not working very well.
> Some times, some tasks just start running, and scheduler already decide it 
> need speculative task to run.
> And this waste a lot of resource.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5576) MR AM unregistration should be failed due to UnknownHostException on getting history url

2013-10-11 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792744#comment-13792744
 ] 

Zhijie Shen commented on MAPREDUCE-5576:


bq. IIUC, failing the unregistration fails the application.

Yes, the current design is to fail the application when unregistration is 
failed. What I mean is that error of getting the history url should not fail 
unregistration, because ApplicationMasterProtocol#finishApplicationMaster 
doesn't require to url to finis an application on RM side. The error of getting 
the history url may be caused by unavailability of JHS, but also be caused by 
mis configuration (actually we ran into this issue). JHS HA will reduce the 
chance of error, but it will still happen.

bq. If we decide to still go through with this change, we should probably fail 
the application early - before running the job and not after.

Agree. In fact, when the url is not available, RM already has the logic to log 
something. Maybe we want to enhance the log.

bq. A better approach might to be explicitly log or show that the JHS is 
down/inaccessible/mis-configured and setting the tracking URL has failed.

I don't want to fail the application. As is mentioned above, I don't think it 
make sense that error of getting the history url should fail unregistration, 
and ultimately fail the application.

> MR AM unregistration should be failed due to UnknownHostException on getting 
> history url
> 
>
> Key: MAPREDUCE-5576
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5576
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Before RMCommunicator sends the request to RM to finish the application, it 
> will try to get the JHS url, which may throw UnknownHostException. The 
> current code path will skip sending the request to RM when the exception is 
> raised, which sounds not a reasonable behavior, because RM's unregistering an 
> AM will not affected by the tracking URL. The URL can be empty or null. 
> AFAIK, the impact of null URL will be that the URL to redirect users from RM 
> web page to JHS will be unavailable, and the job report will not show the URL 
> as well. However, is it much much better than failing an application because 
> of UnknownHostException here? Anyway, users can go to JHS directly to find 
> the application history info.
> Therefore, the reasonable code path here should be catching 
> UnknownHostException and set historyUrl = null



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5576) MR AM unregistration should be failed due to UnknownHostException on getting history url

2013-10-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792645#comment-13792645
 ] 

Karthik Kambatla commented on MAPREDUCE-5576:
-

IIUC, failing the unregistration fails the application. I am not sure if that 
is a good thing to do. It means we require the JHS be running to be able to run 
jobs - even implies that JHS HA is required in addition to RM HA for 
interruption-less submission of MR jobs.

A better approach might to be explicitly log or show that the JHS is 
down/inaccessible/mis-configured and setting the tracking URL has failed.

If we decide to still go through with this change, we should probably fail the 
application early - before running the job and not after.

> MR AM unregistration should be failed due to UnknownHostException on getting 
> history url
> 
>
> Key: MAPREDUCE-5576
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5576
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Before RMCommunicator sends the request to RM to finish the application, it 
> will try to get the JHS url, which may throw UnknownHostException. The 
> current code path will skip sending the request to RM when the exception is 
> raised, which sounds not a reasonable behavior, because RM's unregistering an 
> AM will not affected by the tracking URL. The URL can be empty or null. 
> AFAIK, the impact of null URL will be that the URL to redirect users from RM 
> web page to JHS will be unavailable, and the job report will not show the URL 
> as well. However, is it much much better than failing an application because 
> of UnknownHostException here? Anyway, users can go to JHS directly to find 
> the application history info.
> Therefore, the reasonable code path here should be catching 
> UnknownHostException and set historyUrl = null



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4579) TestTaskAttempt fails jdk7

2013-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792534#comment-13792534
 ] 

Hudson commented on MAPREDUCE-4579:
---

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/])
svn merge -c 1377943 FIXES: MAPREDUCE-4579. Split TestTaskAttempt into two so 
as to pass tests on jdk7. Contributed by Thomas Graves (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531047)
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java


> TestTaskAttempt fails jdk7
> --
>
> Key: MAPREDUCE-4579
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4579
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>  Labels: java7
> Fix For: 3.0.0, 2.0.2-alpha, 0.23.10
>
> Attachments: MAPREDUCE-4579.patch
>
>
> ---
> Test set: org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
> ---
> Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 7.205 sec 
> <<< 
> FAILURE!testAttemptContainerRequest(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt)
>   Time elapsed: 0.032 sec  <<< ERROR!
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
> at org.apache.hadoop.io.Text.readFields(Text.java:280)
> at org.apache.hadoop.security.token.Token.readFields(Token.java:165)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-4571) TestHsWebServicesJobs fails on jdk7

2013-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792531#comment-13792531
 ] 

Hudson commented on MAPREDUCE-4571:
---

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/])
svn merge -c 1457061 FIXES: MAPREDUCE-4571. TestHsWebServicesJobs fails on 
jdk7. Contributed by Thomas Graves (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531024)
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/MockHistoryJobs.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java


> TestHsWebServicesJobs fails on jdk7
> ---
>
> Key: MAPREDUCE-4571
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4571
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: webapps
>Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>  Labels: java7
> Fix For: 2.1.0-beta, 0.23.10
>
> Attachments: MAPREDUCE-4571.patch
>
>
> TestHsWebServicesJobs fails on jdk7. 
> Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.561 sec 
> <<< 
> FAILURE!testJobIdSlash(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs)
>   Time elapsed: 0.334 sec  <<< FAILURE!
> java.lang.AssertionError: mapsTotal incorrect expected:<0> but was:<1>



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5414) TestTaskAttempt fails jdk7 with NullPointerException

2013-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792532#comment-13792532
 ] 

Hudson commented on MAPREDUCE-5414:
---

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/])
svn merge -c 1520964 FIXES: MAPREDUCE-5414. TestTaskAttempt fails in JDK7 with 
NPE. Contributed by Nemon Lou (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531068)
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttempt.java


> TestTaskAttempt fails jdk7 with NullPointerException
> 
>
> Key: MAPREDUCE-5414
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5414
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.5-alpha
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>  Labels: java7
> Fix For: 0.23.10, 2.1.1-beta
>
> Attachments: MAPREDUCE-5414.patch, MAPREDUCE-5414.patch
>
>
> Test case org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt fails 
> once in a while when i run all of them together.
> {code:xml} 
> Running org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
> Tests run: 9, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 7.893 sec <<< 
> FAILURE!
> Results :
> Tests in error:
>   
> testLaunchFailedWhileKilling(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt)
>   
> testContainerCleanedWhileRunning(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt)
>   
> testContainerCleanedWhileCommitting(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt)
>   
> testDoubleTooManyFetchFailure(org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt)
> Tests run: 9, Failures: 0, Errors: 4, Skipped: 0
> {code}
> But if i run a single test case,taking testContainerCleanedWhileRunning for 
> example,it will fail without doubt.
> {code:xml} 
>   classname="org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt" 
> name="testContainerCleanedWhileRunning">
>  type="java.lang.NullPointerException">java.lang.NullPointerException
> at org.apache.hadoop.security.token.Token.write(Token.java:216)
> at 
> org.apache.hadoop.mapred.ShuffleHandler.serializeServiceData(ShuffleHandler.java:205)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createCommonContainerLaunchContext(TaskAttemptImpl.java:695)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.createContainerLaunchContext(TaskAttemptImpl.java:751)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1309)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$ContainerAssignedTransition.transition(TaskAttemptImpl.java:1282)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1009)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testContainerCleanedWhileRunning(TestTaskAttempt.java:410)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
> 

[jira] [Commented] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7

2013-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792529#comment-13792529
 ] 

Hudson commented on MAPREDUCE-4716:
---

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/])
svn merge -c 1457065 FIXES: MAPREDUCE-4716. 
TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7. 
Contributed by Thomas Graves (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531015)
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobsQuery.java


> TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
> 
>
> Key: MAPREDUCE-4716
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>  Labels: java7
> Fix For: 2.1.0-beta, 0.23.10
>
> Attachments: MAPREDUCE-4716.patch
>
>
> Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid  fails.
> It looks like the string changed from "const class" to "constant" in jdk7.
> Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec 
> <<< FAILURE!
> testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery)
>   Time elapsed: 0.371 sec  <<< FAILURE!
> java.lang.AssertionError: exception message doesn't match, got: No enum 
> constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState 
> expected: No enum const class 
> org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState
> at org.junit.Assert.fail(Assert.java:91)at 
> org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77)
> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5425) Junit in TestJobHistoryServer failing in jdk 7

2013-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792530#comment-13792530
 ] 

Hudson commented on MAPREDUCE-5425:
---

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #757 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/757/])
svn merge -c 1511464 FIXES: MAPREDUCE-5425. Junit in TestJobHistoryServer 
failing in jdk 7. Contributed by Robert Parker (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531029)
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryServer.java


> Junit in TestJobHistoryServer failing in jdk 7
> --
>
> Key: MAPREDUCE-5425
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5425
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.0.4-alpha
>Reporter: Ashwin Shankar
>Assignee: Robert Parker
> Fix For: 3.0.0, 0.23.10, 2.1.1-beta
>
> Attachments: MAPREDUCE-5425-2.patch, MAPREDUCE-5425-3.patch, 
> MAPREDUCE-5425.patch
>
>
> We get the following exception when we run the unit tests of 
> TestJobHistoryServer with jdk 7:
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:10033] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:423)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:535)
>   at org.apache.hadoop.ipc.Server.(Server.java:2202)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:901)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:505)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:746)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer.serviceInit(HSAdminServer.java:100)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> This is happening because testMainMethod starts the history server and doesnt 
> stop it. This worked in jdk 6 because tests executed sequentially and this 
> test was last one and didnt affect other tests,but in jdk 7 it fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2013-10-11 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
---

Attachment: MAPREDUCE-4490.patch

Attached patch works well in my local environment and could resolve current 
issue. Any feedback is welcome! 

> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> -
>
> Key: MAPREDUCE-4490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller, tasktracker
>Affects Versions: 0.20.205.0, 1.0.3
>Reporter: George Datskos
>Assignee: sam liu
> Attachments: MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
> at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
> at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
> at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
> at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2013-10-11 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu reassigned MAPREDUCE-4490:
--

Assignee: sam liu

> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> -
>
> Key: MAPREDUCE-4490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller, tasktracker
>Affects Versions: 0.20.205.0, 1.0.3
>Reporter: George Datskos
>Assignee: sam liu
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
> at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
> at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
> at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
> at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5578) Miscellaneous Fair Scheduler speedups

2013-10-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5578:
-

 Summary: Miscellaneous Fair Scheduler speedups
 Key: MAPREDUCE-5578
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5578
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza


I ran the Fair Scheduler's core scheduling loop through a profiler to and 
identified a bunch of minimally invasive changes that can shave off a few 
milliseconds.

The main one is demoting a couple INFO log messages to DEBUG, which brought my 
benchmark down from 16000 ms to 6000.

A few others (which had way less of an impact) were
* Most of the time in comparisons was being spent in Math.signum.  I switched 
this to direct ifs and elses and it halved the percent of time spent in 
comparisons.
* I removed some unnecessary instantiations of Resource objects
* I made it so that queues' usage wasn't calculated from the applications up 
each time getResourceUsage was called.




--
This message was sent by Atlassian JIRA
(v6.1#6144)