[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511046#comment-13511046
 ] 

Hadoop QA commented on MAPREDUCE-4827:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12556183/betterhash2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3100//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3100//console

This message is automatically generated.

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt, betterhash2.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4812) Create reduce input merger plugin in ReduceTask.java and pass it to Shuffle

2012-12-05 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511045#comment-13511045
 ] 

Mariappan Asokan commented on MAPREDUCE-4812:
-

Hi Arun,
  I have some ideas to fix the problem in MAPREDUCE-4842.  I posted my comments 
there.  Please take a look.

Thanks.

-- Asokan

> Create reduce input merger plugin in ReduceTask.java and pass it to Shuffle
> ---
>
> Key: MAPREDUCE-4812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4812
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
> Fix For: 2.0.3-alpha
>
> Attachments: COMBO-mapreduce-4809-4812.patch, 
> COMBO-mapreduce-4809-4812.patch, mapreduce-4812.patch, mapreduce-4812.patch, 
> mapreduce-4812.patch, mapreduce-4812.patch, mapreduce-4812.patch
>
>
> This is part of MAPREDUCE-2454.  This further breaks down MAPREDUCE-4808

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511041#comment-13511041
 ] 

Hadoop QA commented on MAPREDUCE-4839:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12556180/textpartitioner2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 2014 javac 
compiler warnings (more than the trunk's current 2013 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3099//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3099//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3099//console

This message is automatically generated.

> TextPartioner for hashing Text with good hashing function to get better 
> distribution
> 
>
> Key: MAPREDUCE-4839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: textpartitioner1.txt, textpartitioner2.txt
>
>
> partitioner for Text keys using util.Hash framework for hashing function

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511039#comment-13511039
 ] 

Mariappan Asokan commented on MAPREDUCE-4842:
-

Hi Jason, Arun, and Alejandro,
  I came up with a simpler solution to solve this nasty problem.  Instead of a 
single list {{inputs}} in {{MergeThread,}} we can keep a FIFO list of these 
lists.  This will make sure that more than one merge can be pending.  The 
{{run()}} method in {{MergeThread}} will keep pulling out the map output lists 
from the FIFO list to merge them(this is a typical producer-consumer scenario.)

I will outline the changes below:

In {{MergeThread}},

* A {{LinkedList>}} type member({{pendingToBeMerged}}) is added and the 
member {{inputs}} is removed.

* The {{isInProgress()}} method is removed.

* The {{startMerge()}} method will no longer be {{synchronized.}}  It will add 
the passed list to the tail of {{pendingToBeMerged}} and it will 
{{notifyAll()}} on the monitor of {{pendingToBeMerged.}}

* The {{run()}} method will sit in a tight loop.  So long as there is an 
item(list of map outputs) to be consumed, it will consume(merge) the item and 
remove it from {{pendingToBeMerged.}}  If {pendingToBeMerged}} has no more 
item, it will {{notifyAll()}} on the object's monitor after setting
{{inProgress}} to {{false.}}

In {{MergeManager}},

* All calls to {{isInProgress()}} are removed.

* Unnecessary {{synchronized}} clauses on merge thread objects are removed 
since the methods where they are in themselves are {{synchronized.}}

I created a patch with the above changes and tested it on my laptop.  The 
mapreduce tests seem to run without any problem.  However, I do not claim that 
it is completely tested.  It has to go through the rigorous testing that Jason 
did.

If you are interested in taking a look at the patch, I will post it to this 
Jira.  I welcome your questions and suggestions on the idea of the patch.

-- Asokan


> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4847) Command Parsing in Hadoop Streaming

2012-12-05 Thread Peng Lei (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511031#comment-13511031
 ] 

Peng Lei commented on MAPREDUCE-4847:
-

Thank you for your comment!

I have put the command in a script file as a workaround, it works. But in this 
case, the command is not too complex to write a dedicate script file, and on 
fly script generating is a bit tricky(at least for maintainer).

It seems hadoop can't run on windows without cygwin. Another solution may be: 
add a new option to instruct streaming to use an alternative command invoker, 
such as:

  -command_invoker "sh -c"

This could solve the issue and didn't break the existing hadoop-streaming 
application.

-Peng


> Command Parsing in Hadoop Streaming
> ---
>
> Key: MAPREDUCE-4847
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4847
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: Peng Lei
>  Labels: features
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Hadoop streaming parse the mapper and reducer commands by itself, this is not 
> a good choice, when I write a complex mapper/reducer script inline, such as 
> 'perl -ne ...', it don't work.
> An alternative way is to send the command to the shell, simply create new 
> process(sh -c "command_and_args"), this not also simplize the streaming code, 
> but also improve its capability!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe

2012-12-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511029#comment-13511029
 ] 

Karthik Kambatla commented on MAPREDUCE-4843:
-

My bad - read the branch name wrong. I applied the patch locally, and verified 
that the tests that directly use {{DefaultTaskController}} pass - 
TestTaskTrackerLocalization, TestJvmManager, TestTaskEnvironment.

+1

> When using DefaultTaskController, JobLocalizer not thread safe
> --
>
> Key: MAPREDUCE-4843
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 1.1.1
>Reporter: zhaoyunjiong
>Priority: Critical
> Attachments: MAPREDUCE-4843-branch-1.1.patch
>
>
> In our cluster, some times job will failed due to below exception:
> 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201212031626_1115_r_23_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the 
> configured local directories
>   at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
>   at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>   at 
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
>   at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
>   at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)
> The root cause is JobLocalizer is not thread safe.
> In DefaultTaskController.initializeJob method:
>  JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, 
> jobid);
> but in JobLocalizer, it just simply keep the reference of the conf.
> When two TaskLauncher threads(mapLauncher and reduceLauncher) try to 
> initializeJob at same time, it will have two JobLocalizer, but only one conf 
> instance.
> So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset 
> previous job's conf.
> Then it will cause the previous job's job.xml stored at another user's dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner

2012-12-05 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511028#comment-13511028
 ] 

Harsh J commented on MAPREDUCE-4594:


I notice no objects (such as an attempt context object) being passed into the 
setup and cleanup methods you wish to introduce here. Without that how is this 
helpful?

In my mind I was viewing your proposal as a step over writing "extends 
Configurable" for new API partitioner implementations, when one needs at least 
the Configuration object instance to pull values out from.

Plus, the ordering of these calls matter, so tests are absolutely necessary if 
we do not want to regress by accident in future.

> Add init/shutdown methods to mapreduce Partitioner
> --
>
> Key: MAPREDUCE-4594
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: trunk
>Reporter: Radim Kolar
> Attachments: partitioner1.txt
>
>
> The Partitioner supports only the Configurable API, which can be used for 
> basic init in setConf(). Problem is that there is no shutdown function.
> I propose to use standard setup() cleanup() functions like in mapper / 
> reducer.
> Use case is that I need to start and stop spring context and datagrid client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe

2012-12-05 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511019#comment-13511019
 ] 

zhaoyunjiong commented on MAPREDUCE-4843:
-

No need for trunk. In hadoop 2.0, the problem doesn't exist.
It's very difficult to test a thread safe problem, even it's not thread safe, 
in most case it will pass it.

> When using DefaultTaskController, JobLocalizer not thread safe
> --
>
> Key: MAPREDUCE-4843
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 1.1.1
>Reporter: zhaoyunjiong
>Priority: Critical
> Attachments: MAPREDUCE-4843-branch-1.1.patch
>
>
> In our cluster, some times job will failed due to below exception:
> 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201212031626_1115_r_23_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the 
> configured local directories
>   at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
>   at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>   at 
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
>   at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
>   at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)
> The root cause is JobLocalizer is not thread safe.
> In DefaultTaskController.initializeJob method:
>  JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, 
> jobid);
> but in JobLocalizer, it just simply keep the reference of the conf.
> When two TaskLauncher threads(mapLauncher and reduceLauncher) try to 
> initializeJob at same time, it will have two JobLocalizer, but only one conf 
> instance.
> So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset 
> previous job's conf.
> Then it will cause the previous job's job.xml stored at another user's dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe

2012-12-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511003#comment-13511003
 ] 

Karthik Kambatla commented on MAPREDUCE-4843:
-

[~zhaoyunjiong] The patch looks good. Can you post a patch against trunk for QA 
to be able to apply it. Also, I was wondering if it would be possible to add a 
test?

> When using DefaultTaskController, JobLocalizer not thread safe
> --
>
> Key: MAPREDUCE-4843
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 1.1.1
>Reporter: zhaoyunjiong
>Priority: Critical
> Attachments: MAPREDUCE-4843-branch-1.1.patch
>
>
> In our cluster, some times job will failed due to below exception:
> 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201212031626_1115_r_23_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the 
> configured local directories
>   at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
>   at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>   at 
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
>   at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
>   at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)
> The root cause is JobLocalizer is not thread safe.
> In DefaultTaskController.initializeJob method:
>  JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, 
> jobid);
> but in JobLocalizer, it just simply keep the reference of the conf.
> When two TaskLauncher threads(mapLauncher and reduceLauncher) try to 
> initializeJob at same time, it will have two JobLocalizer, but only one conf 
> instance.
> So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset 
> previous job's conf.
> Then it will cause the previous job's job.xml stored at another user's dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-12-05 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4827:
---

Attachment: betterhash2.txt

change it for old mapred api as well

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt, betterhash2.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution

2012-12-05 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4839:
---

Attachment: textpartitioner2.txt

> TextPartioner for hashing Text with good hashing function to get better 
> distribution
> 
>
> Key: MAPREDUCE-4839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: textpartitioner1.txt, textpartitioner2.txt
>
>
> partitioner for Text keys using util.Hash framework for hashing function

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4594) Add init/shutdown methods to mapreduce Partitioner

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510922#comment-13510922
 ] 

Hadoop QA commented on MAPREDUCE-4594:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12556006/partitioner1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3098//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3098//console

This message is automatically generated.

> Add init/shutdown methods to mapreduce Partitioner
> --
>
> Key: MAPREDUCE-4594
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4594
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: trunk
>Reporter: Radim Kolar
> Attachments: partitioner1.txt
>
>
> The Partitioner supports only the Configurable API, which can be used for 
> basic init in setConf(). Problem is that there is no shutdown function.
> I propose to use standard setup() cleanup() functions like in mapper / 
> reducer.
> Use case is that I need to start and stop spring context and datagrid client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510911#comment-13510911
 ] 

Hadoop QA commented on MAPREDUCE-4827:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555191/betterhash1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3097//console

This message is automatically generated.

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4839) TextPartioner for hashing Text with good hashing function to get better distribution

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510901#comment-13510901
 ] 

Hadoop QA commented on MAPREDUCE-4839:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12555646/textpartitioner1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3096//console

This message is automatically generated.

> TextPartioner for hashing Text with good hashing function to get better 
> distribution
> 
>
> Key: MAPREDUCE-4839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4839
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: textpartitioner1.txt
>
>
> partitioner for Text keys using util.Hash framework for hashing function

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510899#comment-13510899
 ] 

Hadoop QA commented on MAPREDUCE-4843:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12556081/MAPREDUCE-4843-branch-1.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3095//console

This message is automatically generated.

> When using DefaultTaskController, JobLocalizer not thread safe
> --
>
> Key: MAPREDUCE-4843
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 1.1.1
>Reporter: zhaoyunjiong
>Priority: Critical
> Attachments: MAPREDUCE-4843-branch-1.1.patch
>
>
> In our cluster, some times job will failed due to below exception:
> 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201212031626_1115_r_23_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the 
> configured local directories
>   at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
>   at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>   at 
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
>   at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
>   at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)
> The root cause is JobLocalizer is not thread safe.
> In DefaultTaskController.initializeJob method:
>  JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, 
> jobid);
> but in JobLocalizer, it just simply keep the reference of the conf.
> When two TaskLauncher threads(mapLauncher and reduceLauncher) try to 
> initializeJob at same time, it will have two JobLocalizer, but only one conf 
> instance.
> So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset 
> previous job's conf.
> Then it will cause the previous job's job.xml stored at another user's dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4845) ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2

2012-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510896#comment-13510896
 ] 

Hadoop QA commented on MAPREDUCE-4845:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12556024/MAPREDUCE-4845-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3094//console

This message is automatically generated.

> ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2 
> --
>
> Key: MAPREDUCE-4845
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4845
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 1.1.1, 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4845-branch-1.patch, MAPREDUCE-4845.patch
>
>
> For backwards compatibility, these methods should exist in both MR1 and MR2.
> Confusingly, these methods return the max memory and used memory of the 
> jobtracker, not the entire cluster.
> I'd propose to add them to MR2 and return -1, and deprecate them in both MR1 
> and MR2.  Alternatively, I could add plumbing to get the resource manager 
> memory stats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4699) TestFairScheduler & TestCapacityScheduler fails due to JobHistory exception

2012-12-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved MAPREDUCE-4699.
---

   Resolution: Fixed
Fix Version/s: 1.1.2
 Hadoop Flags: Reviewed

Committed. Thanks Gopal!

> TestFairScheduler & TestCapacityScheduler fails due to JobHistory exception
> ---
>
> Key: MAPREDUCE-4699
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4699
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: 1.1.2
>
> Attachments: mapreduce-4699.patch, MAPREDUCE4699.txt
>
>
> TestFairScheduler fails due to exception from mapred.JobHistory
> {code}
> null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1975)
>   at 
> org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895)
>   at 
> org.apache.hadoop.mapred.TestFairScheduler.testFifoPool(TestFairScheduler.java:2617)
> {code}
> TestCapacityScheduler fails due to
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1976)
> at 
> org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895)
> at 
> org.apache.hadoop.mapred.TestCapacityScheduler$FakeTaskTrackerManager.setPriority(TestCapacityScheduler.java:653)
> at 
> org.apache.hadoop.mapred.TestCapacityScheduler.testHighPriorityJobInitialization(TestCapacityScheduler.java:2666)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4697) TestMapredHeartbeat fails assertion on HeartbeatInterval

2012-12-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-4697:
--

   Resolution: Fixed
Fix Version/s: 1.1.2
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Gopal!

> TestMapredHeartbeat fails assertion on HeartbeatInterval
> 
>
> Key: MAPREDUCE-4697
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4697
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: 1.1.2
>
> Attachments: mapreduce-4697.patch
>
>
> TestMapredHeartbeat fails test on heart beat interval
> {code}
> FAILED
> expected:<300> but was:<500>
> junit.framework.AssertionFailedError: expected:<300> but was:<500>
> at 
> org.apache.hadoop.mapred.TestMapredHeartbeat.testJobDirCleanup(TestMapredHeartbeat.java:68)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4696) TestMRServerPorts throws NullReferenceException

2012-12-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-4696:
--

   Resolution: Fixed
Fix Version/s: 1.1.2
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Gopal!

> TestMRServerPorts throws NullReferenceException
> ---
>
> Key: MAPREDUCE-4696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4696
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: 1.1.2
>
> Attachments: mapreduce-4696-2.patch, mapreduce-4696.patch
>
>
> TestMRServerPorts throws 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:99)
> at 
> org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:152)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4699) TestFairScheduler & TestCapacityScheduler fails due to JobHistory exception

2012-12-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-4699:
--

Attachment: MAPREDUCE4699.txt

The current patch looks good for the CapacityScheduler test. Updating the patch 
with similar changes for TestFairScheduler - and committing.

> TestFairScheduler & TestCapacityScheduler fails due to JobHistory exception
> ---
>
> Key: MAPREDUCE-4699
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4699
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: mapreduce-4699.patch, MAPREDUCE4699.txt
>
>
> TestFairScheduler fails due to exception from mapred.JobHistory
> {code}
> null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1975)
>   at 
> org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895)
>   at 
> org.apache.hadoop.mapred.TestFairScheduler.testFifoPool(TestFairScheduler.java:2617)
> {code}
> TestCapacityScheduler fails due to
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1976)
> at 
> org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895)
> at 
> org.apache.hadoop.mapred.TestCapacityScheduler$FakeTaskTrackerManager.setPriority(TestCapacityScheduler.java:653)
> at 
> org.apache.hadoop.mapred.TestCapacityScheduler.testHighPriorityJobInitialization(TestCapacityScheduler.java:2666)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4696) TestMRServerPorts throws NullReferenceException

2012-12-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510832#comment-13510832
 ] 

Siddharth Seth commented on MAPREDUCE-4696:
---

+1. Simple enough patch. Will commit this shortly.

> TestMRServerPorts throws NullReferenceException
> ---
>
> Key: MAPREDUCE-4696
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4696
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: mapreduce-4696-2.patch, mapreduce-4696.patch
>
>
> TestMRServerPorts throws 
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:99)
> at 
> org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:152)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4697) TestMapredHeartbeat fails assertion on HeartbeatInterval

2012-12-05 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510833#comment-13510833
 ] 

Siddharth Seth commented on MAPREDUCE-4697:
---

+1. Will commit shortly.

> TestMapredHeartbeat fails assertion on HeartbeatInterval
> 
>
> Key: MAPREDUCE-4697
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4697
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.1.1
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: mapreduce-4697.patch
>
>
> TestMapredHeartbeat fails test on heart beat interval
> {code}
> FAILED
> expected:<300> but was:<500>
> junit.framework.AssertionFailedError: expected:<300> but was:<500>
> at 
> org.apache.hadoop.mapred.TestMapredHeartbeat.testJobDirCleanup(TestMapredHeartbeat.java:68)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4842:
--

Attachment: MAPREDUCE-4842.patch

Thanks for the reviews, Alejandro and Arun.  I updated the patch to address 
Alejandro's comment and also added a comment clarifying why the merge callback 
occurs outside of the lock and after inProgress is cleared per a side 
discussion with Arun.

> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510664#comment-13510664
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4842:
---

One minor NIT,  the scope of exceptionReporter instance var has been changed 
from private to protected for testing purposes. It should be package private 
instead. And preferable, we should add a getter method instead, package private 
(it could be annotated with the visiblefortesting guava annotation). Other than 
that looks good to me.

> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted

2012-12-05 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4850:
-

Attachment: MAPREDUCE-4850.patch

A patch that deletes the staging directory after the system directory.

Manual testing showed that with this patch I couldn't get a recovery failure in 
the scenario in the description. It would be nice to add a unit test, but I'm 
still trying to figure out how to write one for this.


> Job recovery may fail if staging directory has been deleted
> ---
>
> Key: MAPREDUCE-4850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-4850.patch
>
>
> The job staging directory is deleted in the job cleanup task, which happens 
> before the job-info file is deleted from the system directory (by the 
> JobInProgress garbageCollect() method). If the JT shuts down between these 
> two operations, then when the JT restarts and tries to recover the job, it 
> fails since the job.xml and splits are no longer available.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4850) Job recovery may fail if staging directory has been deleted

2012-12-05 Thread Tom White (JIRA)
Tom White created MAPREDUCE-4850:


 Summary: Job recovery may fail if staging directory has been 
deleted
 Key: MAPREDUCE-4850
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4850
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.1
Reporter: Tom White
Assignee: Tom White


The job staging directory is deleted in the job cleanup task, which happens 
before the job-info file is deleted from the system directory (by the 
JobInProgress garbageCollect() method). If the JT shuts down between these two 
operations, then when the JT restarts and tries to recover the job, it fails 
since the job.xml and splits are no longer available.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-4842:


Assignee: Jason Lowe  (was: Arun C Murthy)

> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Attachment: MAPREDUCE-4842.patch

Jason, nice unit test! Thanks!

I've modified it a little to have 2 barriers (mergeStart and mergeComplete) 
rather than use the same 4 times (confused me a lot when I was reviewing it).

Other than that, it looks great. +1

Also, if you don't mind, I'll assign the jira to you - since you've done all 
the heavy lifting and deserve way more credit than I do. Thanks again!

> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Arun C Murthy
>Priority: Blocker
> Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch, 
> MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4842:
-

Status: Open  (was: Patch Available)

> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.5, 2.0.2-alpha
>Reporter: Jason Lowe
>Assignee: Arun C Murthy
>Priority: Blocker
> Attachments: MAPREDUCE-4842.patch, MAPREDUCE-4842.patch
>
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4732) testcase testJobRetire fails using IBM JAVA

2012-12-05 Thread Amir Sanjar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Sanjar updated MAPREDUCE-4732:
---

Summary: testcase testJobRetire fails using IBM JAVA   (was: testcase 
testJobRetire fails using IBM JAVA 7)

> testcase testJobRetire fails using IBM JAVA 
> 
>
> Key: MAPREDUCE-4732
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4732
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.3
> Environment: RHEL 6.2 with IBM JAVA 7 on a x86_64 system
>Reporter: Amir Sanjar
>
> Testcase: testJobRetire took 53.352 sec
> Testcase: testJobRetireWithUnreportedTasks took 41.173 sec
>   FAILED
> Job did not retire
> junit.framework.AssertionFailedError: Job did not retire
>   at 
> org.apache.hadoop.mapred.TestJobRetire.waitTillRetire(TestJobRetire.java:130)
>   at 
> org.apache.hadoop.mapred.TestJobRetire.testJobRetireWithUnreportedTasks(TestJobRetire.java:229)
> Testcase: testJobRemoval took 1.073 sec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4732) testcase testJobRetire fails using IBM JAVA

2012-12-05 Thread Amir Sanjar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510486#comment-13510486
 ] 

Amir Sanjar commented on MAPREDUCE-4732:


was able to reprouduce on IBM JAVA 6.. updatting abstract 

> testcase testJobRetire fails using IBM JAVA 
> 
>
> Key: MAPREDUCE-4732
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4732
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.3
> Environment: RHEL 6.2 with IBM JAVA 7 on a x86_64 system
>Reporter: Amir Sanjar
>
> Testcase: testJobRetire took 53.352 sec
> Testcase: testJobRetireWithUnreportedTasks took 41.173 sec
>   FAILED
> Job did not retire
> junit.framework.AssertionFailedError: Job did not retire
>   at 
> org.apache.hadoop.mapred.TestJobRetire.waitTillRetire(TestJobRetire.java:130)
>   at 
> org.apache.hadoop.mapred.TestJobRetire.testJobRetireWithUnreportedTasks(TestJobRetire.java:229)
> Testcase: testJobRemoval took 1.073 sec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4849) TaskSelector not used in FairScheduler

2012-12-05 Thread Vincent Behar (JIRA)
Vincent Behar created MAPREDUCE-4849:


 Summary: TaskSelector not used in FairScheduler
 Key: MAPREDUCE-4849
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4849
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 1.1.1, 1.0.4
Reporter: Vincent Behar


The documentation (http://hadoop.apache.org/docs/r1.0.4/fair_scheduler.html) 
describes the mapred.fairscheduler.taskselector parameter as an "extension 
point", but while the FairScheduler does instantiate the custom TaskSelector 
provided this way, it does not call any of its methods (obtainNewMapTask, 
obtainNewReduceTask, neededSpeculativeMaps or neededSpeculativeReduces).

We should either update the FairScheduler to use the TaskSelector when 
scheduling a task, or completely remove the TaskSelector and update the 
documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe

2012-12-05 Thread zhaoyunjiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated MAPREDUCE-4843:


Status: Patch Available  (was: Open)

Testing patch

> When using DefaultTaskController, JobLocalizer not thread safe
> --
>
> Key: MAPREDUCE-4843
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 1.1.1
>Reporter: zhaoyunjiong
>Priority: Critical
> Attachments: MAPREDUCE-4843-branch-1.1.patch
>
>
> In our cluster, some times job will failed due to below exception:
> 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201212031626_1115_r_23_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the 
> configured local directories
>   at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
>   at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>   at 
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
>   at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
>   at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)
> The root cause is JobLocalizer is not thread safe.
> In DefaultTaskController.initializeJob method:
>  JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, 
> jobid);
> but in JobLocalizer, it just simply keep the reference of the conf.
> When two TaskLauncher threads(mapLauncher and reduceLauncher) try to 
> initializeJob at same time, it will have two JobLocalizer, but only one conf 
> instance.
> So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset 
> previous job's conf.
> Then it will cause the previous job's job.xml stored at another user's dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe

2012-12-05 Thread zhaoyunjiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated MAPREDUCE-4843:


Attachment: MAPREDUCE-4843-branch-1.1.patch

Update patch.

> When using DefaultTaskController, JobLocalizer not thread safe
> --
>
> Key: MAPREDUCE-4843
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 1.1.1
>Reporter: zhaoyunjiong
>Priority: Critical
> Attachments: MAPREDUCE-4843-branch-1.1.patch
>
>
> In our cluster, some times job will failed due to below exception:
> 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> initializing attempt_201212031626_1115_r_23_0:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
> taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the 
> configured local directories
>   at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
>   at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>   at 
> org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175)
>   at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058)
>   at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213)
> The root cause is JobLocalizer is not thread safe.
> In DefaultTaskController.initializeJob method:
>  JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, 
> jobid);
> but in JobLocalizer, it just simply keep the reference of the conf.
> When two TaskLauncher threads(mapLauncher and reduceLauncher) try to 
> initializeJob at same time, it will have two JobLocalizer, but only one conf 
> instance.
> So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset 
> previous job's conf.
> Then it will cause the previous job's job.xml stored at another user's dir.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira