[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1374:
--

Status: Patch Available  (was: Open)

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1374:
--

Status: Open  (was: Patch Available)

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800129#action_12800129
 ] 

Amar Kamat commented on MAPREDUCE-1316:
---

bq. I was asking for details on the tests run to verify the fix...
Iyappan successfully reproduced this bug by killing jobs with tasks that are 
yet to report their first status (by suspending the tasktrackers). Also he 
successfully reproduced this bug for completed jobs with speculation turned ON 
and also for jobs killed while the setup is still running. He used jmap and 
jobtracker logs to verify the memory leak.

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, 
> mapreduce-1316-v1.15-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800124#action_12800124
 ] 

Arun C Murthy commented on MAPREDUCE-1316:
--

Ok, the logging changes make sense.

Are you ok with the changes toJobInProgress.getTasks(TaskType) ?

bq. MAPREDUCE-1316 was raised because there was a mismatch between task-attempt 
addition and task-attempt removal in the JobTracker. [...]

*smile*

I was asking for details on the tests run to verify the fix... 

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, 
> mapreduce-1316-v1.15-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1333) Parallel running tasks on one single node may slow down the performance

2010-01-13 Thread Zhaoning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800121#action_12800121
 ] 

Zhaoning Zhang commented on MAPREDUCE-1333:
---

virtual machines in the XenServer cluster, some nodes are allocated with 1 cpu 
and some nodes with 2 or 3.
but I think it's a inter-dependence structure problem, the reduce will always 
waiting for the maps to finish its tasks, why not let the maps to use full 
resources separately?

> Parallel running tasks on one single node may slow down the performance
> ---
>
> Key: MAPREDUCE-1333
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1333
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.1
>Reporter: Zhaoning Zhang
>
> When I analysis running tasks performance, I found that parallel running 
> tasks on one single node will not be better performance than the serialized 
> ones.
> We can set mapred.tasktracker.{map|reduce}.tasks.maximum = 1 individually, 
> but there will be parallel map AND reduce tasks.
> And I wonder it's true in the real commercial clusters?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800120#action_12800120
 ] 

Amar Kamat commented on MAPREDUCE-1316:
---

Arun, the logging changes will help in debugging memory leak issues caused 
because of stale references of TaskInProgress objects. The log changes are such 
that one log-line indicating task removal will be printed once per task. This 
is in sync with the task addition log-line and hence any mismatch in task 
adding and removal log-lines should point to a memory leak. This is not true 
today as the task removal log-line is printed in removeMarkedTasks() (caller of 
removeTaskEntry(), the api responsible for removing a task) which is not called 
for every task thats got added to the JobTracker. The log lines introduced are 
not in some loop and will be printed only once per task attempt. 

bq. The bug you point to is irrelevant in the current context i.e. 
JobInProgress.getTasks(TaskType) - '==' or equals is the right implementation.
Looks like hadoop.io serializes enum as strings hence the jvm bug I pointed out 
doesnt hold here.

MAPREDUCE-1316 was raised because there was a mismatch between task-attempt 
addition and task-attempt removal in the JobTracker. The problem was that once 
the job retires, the job tasks are removed based on the statuses available.  
But task-status is added for a task-attempt only when the tasktracker returns 
back (once a task is assigned) with the next heartbeat. But there is a corner 
case in the removal logic.  If the tasktracker is assigned a task and the job 
finishes, then the newly scheduled attempt will be added to the JobTracker but 
will not be removed as its status is not yet available. This patch changes the 
task-removal logic by iterating over all the scheduled/launched attempt-ids 
instead of statuses thus taking care of the corner case mentioned above. 

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, 
> mapreduce-1316-v1.15-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800115#action_12800115
 ] 

Hadoop QA commented on MAPREDUCE-1374:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12430165/MAPREDUCE-1374.3.patch
  against trunk revision 898943.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/384/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/384/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/384/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/384/console

This message is automatically generated.

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-815) Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro Serialization

2010-01-13 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-815:


Attachment: MAPREDUCE-815.patch

Attaching a patch that provides AvroInputFormat/AvroOutputFormat.

AvroInputFormat allows you to set its input schema in the job configuration. It 
provides static methods for this functionality. Depending on the input 
serialization metadata it can choose to deserialize to generic, reflect, or 
specific-based classes. 

This patch includes unit tests for both of these classes.

I have also extended the jobdata API to allow you to set output serialization 
metadata (vs. simple class-name-only metadata) in the same fashion as 
MAPREDUCE-1126 allowed you to set intermediate serialization metadata. This 
deprecates the old methods like {{JobConf.setOutputKeyClass()}}. Note that now 
the PipesMapRunner/PipesReducer, MapFileOutputFormat, and 
SequenceFileOutputFormat rely on these deprecated APIs. MAPREDUCE-1360 will 
require a Hadoop-core-project JIRA that allows SequenceFile to handle 
non-class-based serialization; that will update at least the SequenceFile IF/OF 
APIs. Handling Pipes is a separate issue.

This cannot be submitted to the patch queue until a small change is made to the 
Hadoop-core API (issue is linked), and Hadoop is upgraded across the board to 
Avro 1.3. I'll mark this patch-available when that happens.

> Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro 
> Serialization
> --
>
> Key: MAPREDUCE-815
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-815
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Ravi Gummadi
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-815.patch
>
>
> MapReduce needs AvroInputFormat similar to other InputFormats like 
> TextInputFormat to be able to use avro serialization in hadoop. Similarly 
> AvroOutputFormat is needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-13 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

I discovered a bug.

I expect to have a fixed version of this patch in place by about 10AM PST 1/14 .

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1316:
-

Attachment: mapreduce-1316-v1.15-branch20-yahoo.patch

Amar's patch with the fix to get JobInProgress.getTasks to use '==', no other 
changes.

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, 
> mapreduce-1316-v1.15-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1374:
--

Status: Open  (was: Patch Available)

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1374:
--

Status: Patch Available  (was: Open)

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800041#action_12800041
 ] 

Arun C Murthy commented on MAPREDUCE-1316:
--

Minor comments/nits:

{code}
-taskidToTIPMap.remove(taskid);
-
-LOG.debug("Removing task '" + taskid + "'");
+if (taskidToTIPMap.remove(taskid) != null) {   
+  LOG.info("Removing task '" + taskid + "'");
+}
{code}

This adds a lot more logging? Is it necessary or useful?

{code}
-LOG.info("Removed completed task '" + taskid + "' from '" + 
- taskTracker + "'");
+if (LOG.isDebugEnabled()) {
+  LOG.debug("Removed marked completed task '" + taskid + "' from '" + 
+taskTracker + "'");
+}
{code}

This removes some logs... you don't think they would be useful?

{code}
+LOG.info("Job " + jobId + " added successfully for user '" 
+ + job.getJobConf().getUser() + "' to queue '" 
+ + job.getJobConf().getQueueName() + "'");
{code}

Is this log necessary? I don't see how this is relevant to this patch.



Finally - can you please share some details on how this patch has helped to fix 
the observed bugs? Thanks!

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1327) Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE

2010-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800023#action_12800023
 ] 

Hadoop QA commented on MAPREDUCE-1327:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12430085/MAPREDUCE-1327.5.patch
  against trunk revision 898486.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/270/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/270/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/270/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/270/console

This message is automatically generated.

> Oracle database import via sqoop fails when a table contains the column types 
> such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
> ---
>
> Key: MAPREDUCE-1327
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1327
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/sqoop
>Affects Versions: 0.22.0
>Reporter: Leonid Furman
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1327.3.patch, MAPREDUCE-1327.4.patch, 
> MAPREDUCE-1327.5.patch, MAPREDUCE-1327.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" 
> and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those 
> columns to valid Java data types, resulting in the following exception:
> ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
> at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
> at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
> at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> I have modified the code for Hadoop and Sqoop so this bug is fixed on my 
> machine. Please let me know if you would like me to generate the patch and 
> upload it to this ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800012#action_12800012
 ] 

Arun C Murthy commented on MAPREDUCE-1342:
--

+1

Amareshwari, I'd appreciate if you could provide a patch for the Apache 0.20 
branch too... I'll commit all the way back to 0.20.2. Thanks!

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1276#action_1276
 ] 

Arun C Murthy commented on MAPREDUCE-1316:
--

bq. Note that the implementation of JobInProgrsss.getTasks(TaskType) uses 
string comparison for enums instead of '==' or equals because of the jvm bug 
raised here. I think its safer to compare enum names.

The bug you point to is irrelevant in the current context i.e. 
JobInProgress.getTasks(TaskType) - '==' or equals is the right implementation.

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799985#action_12799985
 ] 

Hadoop QA commented on MAPREDUCE-1221:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12428563/MAPREDUCE-1221-v1.patch
  against trunk revision 898486.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---
>
> Key: MAPREDUCE-1221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1327) Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE

2010-01-13 Thread Leonid Furman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799956#action_12799956
 ] 

Leonid Furman commented on MAPREDUCE-1327:
--

Thank you, Aaron!

> Oracle database import via sqoop fails when a table contains the column types 
> such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
> ---
>
> Key: MAPREDUCE-1327
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1327
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/sqoop
>Affects Versions: 0.22.0
>Reporter: Leonid Furman
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1327.3.patch, MAPREDUCE-1327.4.patch, 
> MAPREDUCE-1327.5.patch, MAPREDUCE-1327.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" 
> and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those 
> columns to valid Java data types, resulting in the following exception:
> ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
> at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
> at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
> at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> I have modified the code for Hadoop and Sqoop so this bug is fixed on my 
> machine. Please let me know if you would like me to generate the patch and 
> upload it to this ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799953#action_12799953
 ] 

Todd Lipcon commented on MAPREDUCE-1342:


Ran jcarder on 
https://issues.apache.org/jira/secure/attachment/12430093/patch-1342-2.txt 
(md5sum 0de59f7b4deb8c4d5e3ea991ba838617) and looks good.

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-752) DistributedCache.addArchiveToClassPath doesn't work

2010-01-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799945#action_12799945
 ] 

Allen Wittenauer commented on MAPREDUCE-752:


Is this going to get a fix for 0.20.2?  

> DistributedCache.addArchiveToClassPath doesn't work
> ---
>
> Key: MAPREDUCE-752
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-752
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0
>Reporter: Vladimir Klimontovich
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-752-ver2.patch, MAPREDUCE-752-ver3.patch, 
> MAPREDUCE-752.patch, MAPREDUCE-752.zip
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be 
> called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this 
> file to classpath to each map/reduce process on job tracker. 
> This method don't work:
> in TaskRunner there is an algorithm that looks for correspondence between DFS 
> paths and local paths in distributed cache.
> It compares
> if (archives[i].getPath().equals(
> archiveClasspaths[j].toString())){
> instead of
> if (archives[i].toString().equals(
> archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1218) Collecting cpu and memory usage for TaskTrackers

2010-01-13 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated MAPREDUCE-1218:


  Resolution: Fixed
Hadoop Flags: [Incompatible change, Reviewed]  (was: [Incompatible change])
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Scott!

> Collecting cpu and memory usage for TaskTrackers
> 
>
> Key: MAPREDUCE-1218
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1218
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Affects Versions: 0.22.0
> Environment: linux
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1218-rename.sh, MAPREDUCE-1218-v2.patch, 
> MAPREDUCE-1218-v3.patch, MAPREDUCE-1218-v4.patch, MAPREDUCE-1218-v5.patch, 
> MAPREDUCE-1218-v6.1.patch, MAPREDUCE-1218-v6.2.patch, 
> MAPREDUCE-1218-v6.patch, MAPREDUCE-1218.patch
>
>
> The information can be used for resource aware scheduling.
> Note that this is related to MAPREDUCE-220. There the per task resource 
> information is collected.
> This one collects the per machine information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1316:
--

Attachment: mapreduce-1316-v1.14.1-branch20-yahoo.patch

All ant tests (except TestReduceFetch, TestJobHistory, TestStreamingExitStatus 
and TestJobTrackerRestartWithCS) passed for the patch attached 
[here|https://issues.apache.org/jira/secure/attachment/12430141/mapreduce-1316-v1.14-branch20-yahoo.patch].
 Failed tests (except TestJobTrackerRestartWithCS) passed upon re-run. 
TestJobTrackerRestartWithCS timesout without the patch too. Attaching a new 
patch incorporating Hemanth's offline comment w.r.t comments in 
TestFairScheduler.

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, 
> mapreduce-1316-v1.14.1-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799930#action_12799930
 ] 

dhruba borthakur commented on MAPREDUCE-1374:
-

+1 for this patch.

> is this same behavior observed elsewhere in HDFS that duplicate

The HDFS servers stores path names as strings and UTF byte arrays but there 
should not be too many duplicate entries in these, isn't it?

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1327) Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE

2010-01-13 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799929#action_12799929
 ] 

Aaron Kimball commented on MAPREDUCE-1327:
--

Leonid,

After this patch gets committed to trunk I'll put it in the queue to review for 
inclusion in CDH. It will almost assuredly be included in CDH3.

> Oracle database import via sqoop fails when a table contains the column types 
> such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
> ---
>
> Key: MAPREDUCE-1327
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1327
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/sqoop
>Affects Versions: 0.22.0
>Reporter: Leonid Furman
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1327.3.patch, MAPREDUCE-1327.4.patch, 
> MAPREDUCE-1327.5.patch, MAPREDUCE-1327.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" 
> and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those 
> columns to valid Java data types, resulting in the following exception:
> ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
> at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
> at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
> at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> I have modified the code for Hadoop and Sqoop so this bug is fixed on my 
> machine. Please let me know if you would like me to generate the patch and 
> upload it to this ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1327) Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE

2010-01-13 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1327:
-

Status: Open  (was: Patch Available)

> Oracle database import via sqoop fails when a table contains the column types 
> such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
> ---
>
> Key: MAPREDUCE-1327
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1327
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/sqoop
>Affects Versions: 0.22.0
>Reporter: Leonid Furman
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1327.3.patch, MAPREDUCE-1327.4.patch, 
> MAPREDUCE-1327.5.patch, MAPREDUCE-1327.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" 
> and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those 
> columns to valid Java data types, resulting in the following exception:
> ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
> at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
> at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
> at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> I have modified the code for Hadoop and Sqoop so this bug is fixed on my 
> machine. Please let me know if you would like me to generate the patch and 
> upload it to this ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1327) Oracle database import via sqoop fails when a table contains the column types such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE

2010-01-13 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1327:
-

Status: Patch Available  (was: Open)

cycling patch to retrigger hudson

> Oracle database import via sqoop fails when a table contains the column types 
> such as TIMESTAMP(6) WITH LOCAL TIME ZONE and TIMESTAMP(6) WITH TIME ZONE
> ---
>
> Key: MAPREDUCE-1327
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1327
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/sqoop
>Affects Versions: 0.22.0
>Reporter: Leonid Furman
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1327.3.patch, MAPREDUCE-1327.4.patch, 
> MAPREDUCE-1327.5.patch, MAPREDUCE-1327.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> When Oracle table contains the columns "TIMESTAMP(6) WITH LOCAL TIME ZONE" 
> and "TIMESTAMP(6) WITH TIME ZONE", Sqoop fails to map values for those 
> columns to valid Java data types, resulting in the following exception:
> ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateFields(ClassWriter.java:253)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generateClassForColumns(ClassWriter.java:701)
> at 
> org.apache.hadoop.sqoop.orm.ClassWriter.generate(ClassWriter.java:597)
> at org.apache.hadoop.sqoop.Sqoop.generateORM(Sqoop.java:75)
> at org.apache.hadoop.sqoop.Sqoop.importTable(Sqoop.java:87)
> at org.apache.hadoop.sqoop.Sqoop.run(Sqoop.java:175)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.sqoop.Sqoop.main(Sqoop.java:201)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> I have modified the code for Hadoop and Sqoop so this bug is fixed on my 
> machine. Please let me know if you would like me to generate the patch and 
> upload it to this ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1333) Parallel running tasks on one single node may slow down the performance

2010-01-13 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799895#action_12799895
 ] 

Zheng Shao commented on MAPREDUCE-1333:
---

How many CPU cores do each of the node have?

> Parallel running tasks on one single node may slow down the performance
> ---
>
> Key: MAPREDUCE-1333
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1333
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, task, tasktracker
>Affects Versions: 0.20.1
>Reporter: Zhaoning Zhang
>
> When I analysis running tasks performance, I found that parallel running 
> tasks on one single node will not be better performance than the serialized 
> ones.
> We can set mapred.tasktracker.{map|reduce}.tasks.maximum = 1 individually, 
> but there will be parallel map AND reduce tasks.
> And I wonder it's true in the real commercial clusters?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1374:
--

Attachment: MAPREDUCE-1374.3.patch

Added comment before "Path getPath()" to address Todd's comment.


> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799890#action_12799890
 ] 

Zheng Shao commented on MAPREDUCE-1374:
---

Thanks Todd.
Yes I see the merit of adding a weak reference map in the Path class. That will 
still consume several times larger memory than String, but will help remove the 
potential duplicate Path objects.


> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799876#action_12799876
 ] 

Todd Lipcon commented on MAPREDUCE-1374:


Looks good to me. It may be worth adding a short comment to the effect that 
storing the Path as a String is safe because "new Path(p.toString()).equals(p)" 
is an invariant of the Path class. (this is tested in the TestPath unit test)

A question that shouldn't block this patch: is this same behavior observed 
elsewhere in HDFS that duplicate Paths use up a lot of unnecessary memory? 
Would it be worth adding a static weak reference map in the Path class and 
adding a path.intern() call which achieves the same reference sharing with easy 
use by anyone who wants it?

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-01-13 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1221:
--

Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---
>
> Key: MAPREDUCE-1221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-01-13 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1221:
--

Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---
>
> Key: MAPREDUCE-1221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.22.0
>Reporter: dhruba borthakur
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799790#action_12799790
 ] 

Hemanth Yamijala commented on MAPREDUCE-1316:
-

bq. Prior to this patch, they are accessed in an unsynchronized way. Making 
getTasks(TaskType) synchronized might get rid of the findbugs warnings but will 
add some more risk to this patch. Maybe we can follow this in another jira, 
thoughts?

Amar and I discussed this and to me, it makes sense. Basically this patch does 
not make the situation any worse than it already is. The bug identified is not 
in the scope of this jira. So, I'd suggest we file another JIRA to track that 
and live with this in the interim.

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1316:
--

Attachment: mapreduce-1316-v1.14-branch20-yahoo.patch

Attaching a patch for Yahoo!'s internal 20 branch (no to committed). This patch 
incorporates review comments from Hemanth and Arun. Changes are as follows :
- The junit mock test is changed to test all the task types.
-  Also added a _getTasks(TaskType)_ api in JobInProgress. Note that the 
findbugs cribbed on the _getTasks(TaskType)_ change with IS2_INCONSISTENT_SYNC 
warning. This error occurs because except _getTasks(TaskType)_, all the task 
arrays (i.e maps, reduces, setup, cleanup) are accessed inside JobInProgress 
lock. Prior to this patch, they are accessed in an unsynchronized way. Making 
_getTasks(TaskType)_ synchronized might get rid of the findbugs warnings but 
will add some more risk to this patch. Maybe we can follow this in another 
jira, thoughts?
- Note that the implementation of _JobInProgrsss.getTasks(TaskType)_ uses 
string comparison for enums instead of '==' or equals because of the jvm bug 
raised [here|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6277781]. I 
think its safer to compare enum names.

I ran all the tests that are touched by this patch and they have passed. I am 
now running the remaining tests. Will upload the test results. 

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, 
> mapreduce-1316-v1.14-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799720#action_12799720
 ] 

Hadoop QA commented on MAPREDUCE-1342:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430117/patch-1342-3.txt
  against trunk revision 898486.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/382/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/382/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/382/console

This message is automatically generated.

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit

2010-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799687#action_12799687
 ] 

Hadoop QA commented on MAPREDUCE-1374:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12430108/MAPREDUCE-1374.2.patch
  against trunk revision 898486.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/console

This message is automatically generated.

> Reduce memory footprint of FileSplit
> 
>
> Key: MAPREDUCE-1374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.21.0, 0.22.0
>
> Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>   for (NodeInfo host: hostList) {
> // Strip out the port number from the host name
> -retVal[index++] = host.node.getName().split(":")[0];
> +retVal[index++] = host.node.getName().split(":")[0].intern();
> if (index == replicationFactor) {
>   done = true;
>   break;
> }
>   }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799686#action_12799686
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1342:


test-patch passed for Y! dist patch

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1372) ConcurrentModificationException in JobInProgress

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799685#action_12799685
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1372:


bq. Node resolution through heartbeat(JT.addNewTracker) also need to wait for 
the thread, right?

Discussed this with Arun. We can have node resolution through heartbeat inline 
(as it is now). Make node resolution for JobInProgress go thru the Thread.

> ConcurrentModificationException in JobInProgress
> 
>
> Key: MAPREDUCE-1372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1372
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1372-0.patch
>
>
> We have seen the following  ConcurrentModificationException in one of our 
> clusters
> {noformat}
> java.io.IOException: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> at 
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2018)
> at 
> org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1077)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:796)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:589)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:677)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:348)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTask(CapacityTaskScheduler.java:1397)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1349)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2976)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1372) ConcurrentModificationException in JobInProgress

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799680#action_12799680
 ] 

Arun C Murthy commented on MAPREDUCE-1372:
--

bq. Actually, come to think of it, we can probably make 
JobTracker.resolveAndAddToTopology a synchronized method once we introduce the 
new thread I alluded to... ditto with JobTracker.addHostToNodeMapping.

I thought it's useful to point out that one of two callers to 
JobTracker.resolveAndAddToTopology is already a synchronized JobTracker method, 
hence my previous comment: 
JobTracker.resolveAndAddToTopology
  -> JobTracker.addNewTracker
-> JobTracker.processHeartbeat
  -> JobInProgress.createCache

> ConcurrentModificationException in JobInProgress
> 
>
> Key: MAPREDUCE-1372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1372
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1372-0.patch
>
>
> We have seen the following  ConcurrentModificationException in one of our 
> clusters
> {noformat}
> java.io.IOException: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> at 
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2018)
> at 
> org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1077)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:796)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:589)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:677)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:348)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTask(CapacityTaskScheduler.java:1397)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1349)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2976)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799670#action_12799670
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1342:


Also verified that the callers from jsps do not have any issues.

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1342:
---

Attachment: patch-1342-3.txt

Patch for trunk

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1342:
---

Attachment: patch-1342-3-ydist.txt

Added comments about locking order assumptions to methods 
JobTracker.addNewTracker and JobTracker.removeTracker.

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1342:
---

Status: Patch Available  (was: Open)

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-3-ydist.txt, patch-1342-3.txt, 
> patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1342:
---

Status: Open  (was: Patch Available)

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1372) ConcurrentModificationException in JobInProgress

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799661#action_12799661
 ] 

Arun C Murthy commented on MAPREDUCE-1372:
--

Actually, come to think of it, we can probably make 
JobTracker.resolveAndAddToTopology a synchronized method once we introduce the 
new thread I alluded to... ditto with JobTracker.addHostToNodeMapping. 

> ConcurrentModificationException in JobInProgress
> 
>
> Key: MAPREDUCE-1372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1372
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1372-0.patch
>
>
> We have seen the following  ConcurrentModificationException in one of our 
> clusters
> {noformat}
> java.io.IOException: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> at 
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2018)
> at 
> org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1077)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:796)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:589)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:677)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:348)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTask(CapacityTaskScheduler.java:1397)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1349)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2976)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1372) ConcurrentModificationException in JobInProgress

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799659#action_12799659
 ] 

Arun C Murthy commented on MAPREDUCE-1372:
--

Good point! I'm thinking we can use hostnameToNodeMap as the gating lock in 
JobTracker.addHostToNodeMapping, thoughts?

> ConcurrentModificationException in JobInProgress
> 
>
> Key: MAPREDUCE-1372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1372
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1372-0.patch
>
>
> We have seen the following  ConcurrentModificationException in one of our 
> clusters
> {noformat}
> java.io.IOException: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> at 
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2018)
> at 
> org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1077)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:796)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:589)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:677)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:348)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTask(CapacityTaskScheduler.java:1397)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1349)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2976)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799658#action_12799658
 ] 

Arun C Murthy commented on MAPREDUCE-1342:
--

Minor nit: I'd request you to add comments about the locking assumptions/order 
to some more methods which need them  e.g. to JobTracker.removeTracker. Other 
than that this looks ready... thanks!

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1316) JobTracker holds stale references to retired jobs via unreported tasks

2010-01-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799646#action_12799646
 ] 

Hemanth Yamijala commented on MAPREDUCE-1316:
-

I am OK with this suggestion. One advantage I see over my proposal is that it 
localizes extension to support new task types in only one API 
getTasks(TaskType) as opposed to two - (one for introducing a getter and in my 
proposed getAllTIPsByType). Seems like the right thing to do from that 
perspective.

> JobTracker holds stale references to retired jobs via unreported tasks 
> ---
>
> Key: MAPREDUCE-1316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1316
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: mapreduce-1316-v1.11.patch, 
> mapreduce-1316-v1.13-branch20-yahoo.patch, mapreduce-1316-v1.7.patch
>
>
> JobTracker fails to remove _unreported_ tasks' mapping from _taskToTIPMap_ if 
> the job finishes and retires. _Unreported tasks_ refers to tasks that were 
> scheduled but the tasktracker did not report back with the task status. In 
> such cases a stale reference is held to TaskInProgress (and thus 
> JobInProgress) long after the job is gone leading to memory leak.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1372) ConcurrentModificationException in JobInProgress

2010-01-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799645#action_12799645
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1372:


Also the following code in addHostToNodeMapping (called with two different 
locks) needs to be atomic.
{code}
if ((node = clusterMap.getNode(networkLoc+"/"+host)) == null) {
  node = new NodeBase(host, networkLoc);
  clusterMap.add(node);
  .
  hostnameToNodeMap.put(host, node);
  nodesAtMaxLevel.add(getParentNode(node, getNumTaskCacheLevels() - 1));
}
{code}

> ConcurrentModificationException in JobInProgress
> 
>
> Key: MAPREDUCE-1372
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1372
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1372-0.patch
>
>
> We have seen the following  ConcurrentModificationException in one of our 
> clusters
> {noformat}
> java.io.IOException: java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> at java.util.HashMap$KeyIterator.next(HashMap.java:828)
> at 
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2018)
> at 
> org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1077)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:796)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:589)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:677)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:348)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTask(CapacityTaskScheduler.java:1397)
> at 
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1349)
> at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2976)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1342) Potential JT deadlock in faulty TT tracking

2010-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799643#action_12799643
 ] 

Hadoop QA commented on MAPREDUCE-1342:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430093/patch-1342-2.txt
  against trunk revision 898486.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/381/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/381/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/381/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/381/console

This message is automatically generated.

> Potential JT deadlock in faulty TT tracking
> ---
>
> Key: MAPREDUCE-1342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Todd Lipcon
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: cycle0.png, mapreduce-1342-1.patch, 
> mapreduce-1342-2.patch, patch-1342-1.txt, patch-1342-2-ydist.txt, 
> patch-1342-2.txt, patch-1342-ydist.txt, patch-1342.txt
>
>
> JT$FaultyTrackersInfo.incrementFaults first locks potentiallyFaultyTrackers, 
> and then calls blackListTracker, which calls removeHostCapacity, which locks 
> JT.taskTrackers
> On the other hand, JT.blacklistedTaskTrackers() locks taskTrackers, then 
> calls faultyTrackers.isBlacklisted() which goes on to lock 
> potentiallyFaultyTrackers.
> I haven't produced such a deadlock, but the lock ordering here is inverted 
> and therefore could deadlock.
> Not sure if this goes back to 0.21 or just in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.