[jira] Updated: (MAPREDUCE-1521) Protection against incorrectly configured reduces

2010-07-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1521:
-

Attachment: MAPREDUCE-1521-0.20-yahoo.patch

minor change to test case.

> Protection against incorrectly configured reduces
> -
>
> Key: MAPREDUCE-1521
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1521
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Arun C Murthy
>Assignee: Mahadev konar
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1521-0.20-yahoo.patch, 
> MAPREDUCE-1521-0.20-yahoo.patch, MAPREDUCE-1521-0.20-yahoo.patch, 
> MAPREDUCE-1521-0.20-yahoo.patch
>
>
> We've seen a fair number of instances where naive users process huge 
> data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce.
> This is a significant problem on large clusters since it takes each attempt 
> of the reduce a long time to shuffle and then run into problems such as local 
> disk-space etc. Then it takes 4 such attempts.
> Proposal: Come up with heuristics/configs to fail such jobs early. 
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1521) Protection against incorrectly configured reduces

2010-07-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1521:
-

Attachment: MAPREDUCE-1521-0.20-yahoo.patch

this patch adds a test case.

> Protection against incorrectly configured reduces
> -
>
> Key: MAPREDUCE-1521
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1521
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Arun C Murthy
>Assignee: Mahadev konar
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1521-0.20-yahoo.patch, 
> MAPREDUCE-1521-0.20-yahoo.patch, MAPREDUCE-1521-0.20-yahoo.patch
>
>
> We've seen a fair number of instances where naive users process huge 
> data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce.
> This is a significant problem on large clusters since it takes each attempt 
> of the reduce a long time to shuffle and then run into problems such as local 
> disk-space etc. Then it takes 4 such attempts.
> Proposal: Come up with heuristics/configs to fail such jobs early. 
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1528) TokenStorage should not be static

2010-07-09 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved MAPREDUCE-1528.


 Assignee: Jitendra Nath Pandey  (was: Arun C Murthy)
Fix Version/s: 0.22.0
   Resolution: Fixed

I just committed this. Thanks, Jitendra and Arun!

> TokenStorage should not be static
> -
>
> Key: MAPREDUCE-1528
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1528
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Jitendra Nath Pandey
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1528.1.patch, MAPREDUCE-1528.2.patch, 
> MAPREDUCE-1528.3.patch, MAPREDUCE-1528.4.patch, 
> MAPREDUCE-1528_yhadoop20.patch, MAPREDUCE-1528_yhadoop20.patch, 
> MAPREDUCE-1528_yhadoop20.patch, MAPREDUCE-1528_yhadoop20.patch, 
> MAPREDUCE-1528_yhadoop20.patch, MAPREDUCE-1528_yhadoop20.patch, 
> MAPREDUCE-1528_yhadoop20.patch
>
>
> Currently, TokenStorage is a singleton. This doesn't work for some use cases, 
> such as Oozie. I think that each Job should have a TokenStorage that is 
> associated it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886853#action_12886853
 ] 

Ted Yu commented on MAPREDUCE-1928:
---

Case 2 is related to MAPREDUCE-1849
One possibility is to combine the two MapReduces into one during the 
optimization step.

> Dynamic information fed into Hadoop for controlling execution of a submitted 
> job
> 
>
> Key: MAPREDUCE-1928
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, jobtracker, tasktracker
>Affects Versions: 0.20.3
>Reporter: Raman Grover
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> Currently the job submission protocol requires the job provider to put every 
> bit of information inside an instance of JobConf. The submitted information 
> includes the input data (hdfs path) , suspected resource requirement, number 
> of reducers etc.  This information is read by JobTracker as part of job 
> initialization. Once initialized, job is moved into a running state. From 
> this point, there is no mechanism for any additional information to be fed 
> into Hadoop infrastructure for controlling the job execution. 
>The execution pattern for the job looks very much 
> static from this point. Using the size of input data and a few settings 
> inside JobConf, number of mappers is computed. Hadoop attempts at reading the 
> whole of data in parallel by launching parallel map tasks. Once map phase is 
> over, a known number of reduce tasks (supplied as part of  JobConf) are 
> started. 
> Parameters that control the job execution were set in JobConf prior to 
> reading the input data. As the map phase progresses, useful information based 
> upon the content of the input data surfaces and can be used in controlling 
> the further execution of the job. Let us walk through some of the examples 
> where additional information can be fed to Hadoop subsequent to job 
> submission for optimal execution of the job. 
> I) "Process a part of the input , based upon the results decide if reading 
> more input is required " 
> In a huge data set, user is interested in finding 'k' records that 
> satisfy a predicate, essentially sampling the data. In current 
> implementation, as the data is huge, a large no of mappers would be launched 
> consuming a significant fraction of the available map slots in the cluster. 
> Each map task would attempt at emitting a max of  'k' records. With N 
> mappers, we get N*k records out of which one can pick any k to form the final 
> result. 
>This is not optimal as:
>1)  A larger number of map slots get occupied initially, affecting other 
> jobs in the queue. 
>2) If the selectivity of input data is very low, we essentially did not 
> need scanning the whole of data to form our result. 
> we could have finished by reading a fraction of input data, 
> monitoring the cardinality of the map output and determining if 
>more input needs to be processed.  
>
>Optimal way: If reading the whole of input requires N mappers, launch only 
> 'M' initially. Allow them to complete. Based upon the statistics collected, 
> decide additional number of mappers to be launched next and so on until the 
> whole of input has been processed or enough records have been collected to 
> for the results, whichever is earlier. 
>  
>  
> II)  "Here is some data, the remaining is yet to arrive, but you may start 
> with it, and receive more input later"
>  Consider a chain of 2 M-R jobs chained together such that the latter 
> reads the output of the former. The second MR job cannot be started until the 
> first has finished completely. This is essentially because Hadoop needs to be 
> told the complete information about the input before beginning the job. 
> The first M-R has produced enough data ( not finished yet) that can be 
> processed by another MR job and hence the other MR need not wait to grab the 
> whole of input before beginning.  Input splits could be supplied later , but 
> ofcourse before the copy/shuffle phase.
>  
> III)  " Input data has undergone one round of processing by map phase, have 
> some stats, can now say of the resources 
> required further" 
>Mappers can produce useful stats about of their output, like the 
> cardinality or produce a histogram describing distribution of output . These 
> stats are available to the job provider (Hive/Pig/End User) who can 
>   now determine with better accuracy of the resources (memory 
> requirements ) required in reduction phase,  and even the numberof  
> reducers or may even alter the reduction logic by altering the reducer clas

[jira] Updated: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.

2010-07-09 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1794:
-

Attachment: MAPREDUCE-1794.patch

Re-generated the patch for trunk based on the folder structure mentioned by the 
cos.

> Test the job status of lost task trackers before and after the timeout.
> ---
>
> Key: MAPREDUCE-1794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1794-ydist-security.patch, 1794_lost_tasktracker.patch, 
> MAPREDUCE-1794.patch, MAPREDUCE-1794.patch
>
>
> This test covers the following scenarios.
> 1. Verify the job status whether it is succeeded or not when  the task 
> tracker is lost and alive before the timeout.
> 2. Verify the job status and killed attempts of a task whether it is 
> succeeded or not and killed attempts are matched or not  when the task 
> trackers are lost and it timeout for all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1713) Utilities for system tests specific.

2010-07-09 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1713:
-

Attachment: MAPREDUCE-1713.patch

Patch for trunk generated on July 10th 2010.

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
> MAPREDUCE-1713.patch, systemtestutils_MR1713.patch, 
> utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1528) TokenStorage should not be static

2010-07-09 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886828#action_12886828
 ] 

Jitendra Nath Pandey commented on MAPREDUCE-1528:
-

tests, javadoc, findbugs, javac warnings were run manually. 

> TokenStorage should not be static
> -
>
> Key: MAPREDUCE-1528
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1528
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Arun C Murthy
> Attachments: MAPREDUCE-1528.1.patch, MAPREDUCE-1528.2.patch, 
> MAPREDUCE-1528.3.patch, MAPREDUCE-1528.4.patch, 
> MAPREDUCE-1528_yhadoop20.patch, MAPREDUCE-1528_yhadoop20.patch, 
> MAPREDUCE-1528_yhadoop20.patch, MAPREDUCE-1528_yhadoop20.patch, 
> MAPREDUCE-1528_yhadoop20.patch, MAPREDUCE-1528_yhadoop20.patch, 
> MAPREDUCE-1528_yhadoop20.patch
>
>
> Currently, TokenStorage is a singleton. This doesn't work for some use cases, 
> such as Oozie. I think that each Job should have a TokenStorage that is 
> associated it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1521) Protection against incorrectly configured reduces

2010-07-09 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1521:
-

Attachment: MAPREDUCE-1521-0.20-yahoo.patch

this patch adds some diagnostic information for the users on why the job 
failed.  I am still adding a junit test for this patch. again this patch is for 
yahoo 0.20. Will upload a patch for trunk soon.


> Protection against incorrectly configured reduces
> -
>
> Key: MAPREDUCE-1521
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1521
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Arun C Murthy
>Assignee: Mahadev konar
>Priority: Critical
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1521-0.20-yahoo.patch, 
> MAPREDUCE-1521-0.20-yahoo.patch
>
>
> We've seen a fair number of instances where naive users process huge 
> data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce.
> This is a significant problem on large clusters since it takes each attempt 
> of the reduce a long time to shuffle and then run into problems such as local 
> disk-space etc. Then it takes 4 such attempts.
> Proposal: Come up with heuristics/configs to fail such jobs early. 
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1861) Raid should rearrange the replicas while raiding

2010-07-09 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886816#action_12886816
 ] 

Scott Chen commented on MAPREDUCE-1861:
---

Thanks for the review, Rodrigo.
I am resubmitting this to Hudson because there has been no response for a while.

> Raid should rearrange the replicas while raiding
> 
>
> Key: MAPREDUCE-1861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1861.txt
>
>
> Raided file introduce extra dependencies on the blocks on the same stripe.
> Therefore we need a new way to place the blocks.
> It is desirable that raided file satisfies the following two conditions:
> a. Replicas on the same stripe should be on different machines (or racks)
> b. Replicas of the same block should be on different racks
> MAPREDUCE-1831 will try to delete the replicas on the same stripe and the 
> same machine (a).
> But in the mean time, it will try to maintain the number of distinct racks of 
> one block (b).
> We cannot satisfy (a) and (b) at the same time with the current logic in 
> BlockPlacementPolicyDefault.chooseTarget().
> One choice we have is to change BlockPlacementPolicyDefault.chooseTarget().
> However, this placement is in general good for all files including the 
> unraided ones.
> It is not clear to us that we can make this good for both raided and unraided 
> files.
> So we propose this idea that when raiding the file. We create one more 
> off-rack replica (so the replication=4 now).
> Than we delete two blocks using the policy in MAPREDUCE-1831 after that 
> (replication=2 now).
> This way we can rearrange the replicas to satisfy (a) and (b) at the same 
> time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1861) Raid should rearrange the replicas while raiding

2010-07-09 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1861:
--

Status: Patch Available  (was: Open)

> Raid should rearrange the replicas while raiding
> 
>
> Key: MAPREDUCE-1861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1861.txt
>
>
> Raided file introduce extra dependencies on the blocks on the same stripe.
> Therefore we need a new way to place the blocks.
> It is desirable that raided file satisfies the following two conditions:
> a. Replicas on the same stripe should be on different machines (or racks)
> b. Replicas of the same block should be on different racks
> MAPREDUCE-1831 will try to delete the replicas on the same stripe and the 
> same machine (a).
> But in the mean time, it will try to maintain the number of distinct racks of 
> one block (b).
> We cannot satisfy (a) and (b) at the same time with the current logic in 
> BlockPlacementPolicyDefault.chooseTarget().
> One choice we have is to change BlockPlacementPolicyDefault.chooseTarget().
> However, this placement is in general good for all files including the 
> unraided ones.
> It is not clear to us that we can make this good for both raided and unraided 
> files.
> So we propose this idea that when raiding the file. We create one more 
> off-rack replica (so the replication=4 now).
> Than we delete two blocks using the policy in MAPREDUCE-1831 after that 
> (replication=2 now).
> This way we can rearrange the replicas to satisfy (a) and (b) at the same 
> time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1861) Raid should rearrange the replicas while raiding

2010-07-09 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1861:
--

Status: Open  (was: Patch Available)

> Raid should rearrange the replicas while raiding
> 
>
> Key: MAPREDUCE-1861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1861.txt
>
>
> Raided file introduce extra dependencies on the blocks on the same stripe.
> Therefore we need a new way to place the blocks.
> It is desirable that raided file satisfies the following two conditions:
> a. Replicas on the same stripe should be on different machines (or racks)
> b. Replicas of the same block should be on different racks
> MAPREDUCE-1831 will try to delete the replicas on the same stripe and the 
> same machine (a).
> But in the mean time, it will try to maintain the number of distinct racks of 
> one block (b).
> We cannot satisfy (a) and (b) at the same time with the current logic in 
> BlockPlacementPolicyDefault.chooseTarget().
> One choice we have is to change BlockPlacementPolicyDefault.chooseTarget().
> However, this placement is in general good for all files including the 
> unraided ones.
> It is not clear to us that we can make this good for both raided and unraided 
> files.
> So we propose this idea that when raiding the file. We create one more 
> off-rack replica (so the replication=4 now).
> Than we delete two blocks using the policy in MAPREDUCE-1831 after that 
> (replication=2 now).
> This way we can rearrange the replicas to satisfy (a) and (b) at the same 
> time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1731) Process tree clean up suspended task tests.

2010-07-09 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1731:
-

Attachment: MAPREDUCE-1731.patch

Patch for trunk.

> Process tree clean up suspended task tests.
> ---
>
> Key: MAPREDUCE-1731
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1731
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1731-ydist-security.patch, 1731-ydist-security.patch, 
> 1731-ydist-security.patch, MAPREDUCE-1731.patch, suspendtask_1731.patch, 
> suspendtask_1731.patch
>
>
> 1 .Verify the process tree cleanup of suspended task and task should be 
> terminated after timeout.
> 2. Verify the process tree cleanup of suspended task and resume the task 
> before task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1693) Process tree clean up of either a failed task or killed task tests.

2010-07-09 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1693:
-

Attachment: MAPREDUCE-1693.patch

Patch for trunk.

> Process tree clean up of either a failed task or killed task tests.
> ---
>
> Key: MAPREDUCE-1693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1693
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1693-ydist_security.patch, 1693-ydist_security.patch, 
> 1693-ydist_security.patch, MAPREDUCE-1693.patch, taskchildskilling_1693.diff, 
> taskchildskilling_1693.diff, taskchildskilling_1693.patch, 
> taskchildskilling_1693.patch, taskchildskilling_1693.patch, 
> taskchildskilling_1693.patch, taskchildskilling_1693.patch, 
> taskchildskilling_1693.patch
>
>
> The following scenarios covered in the test.
> 1. Run a job which spawns subshells in the tasks. Kill one of the task. All 
> the child process of the killed task must be killed.
> 2. Run a job which spawns subshells in tasks. Fail one of the task. All the 
> child process of the killed task must be killed along with the task after its 
> failure.
> 3. Check process tree cleanup on paritcular task-tracker when we use 
> -kill-task and -fail-task with both map and reduce.
> 4. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Let the job complete . Check if all the 
> child processes are killed, the overall job should fail.
> l)Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Kill/fail the job while in progress. 
> Check if all the child processes are killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.

2010-07-09 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1710:
-

Attachment: MAPREDUCE-1710.patch

Patch for trunk.

> Process tree clean up of exceeding memory limit tasks.
> --
>
> Key: MAPREDUCE-1710
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, 
> 1710-ydist_security.patch, MAPREDUCE-1710.patch, memorylimittask_1710.patch, 
> memorylimittask_1710.patch, memorylimittask_1710.patch, 
> memorylimittask_1710.patch, memorylimittask_1710.patch
>
>
> 1. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Let the job complete . Check if all the 
> child processes are killed, the overall job should fail.
> 2. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Kill/fail the job while in progress. 
> Check if all the child processes are killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: java.lang.OutOfMemoryError: Java heap space

2010-07-09 Thread anshul goel
unsubscrive

On Fri, Jul 9, 2010 at 10:44 PM, Shuja Rehman  wrote:

> Hi All
>
> I am facing a hard problem. I am running a map reduce job using streaming
> but it fails and it gives the following error.
>
> Caught: java.lang.OutOfMemoryError: Java heap space
>at Nodemapper5.parseXML(Nodemapper5.groovy:25)
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 1
>at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>at
> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
>
> I have increased the heap size in hadoop-env.sh and make it 2000M. Also I
> tell the job manually by following line.
>
> -D mapred.child.java.opts=-Xmx2000M \
>
> but it still gives the error. The same job runs fine if i run on shell
> using
> 1024M heap size like
>
> cat file.xml | /root/Nodemapper5.groovy
>
>
> Any clue?
>
> Thanks in advance.
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> _
> MS CS - School of Science and Engineering
> Lahore University of Management Sciences (LUMS)
> Sector U, DHA, Lahore, 54792, Pakistan
> Cell: +92 3214207445
>


Re: java.lang.OutOfMemoryError: Java heap space

2010-07-09 Thread anshul goel
unsubscribe

On Fri, Jul 9, 2010 at 10:59 PM, anshul goel  wrote:

> unsubscrive
>
>
> On Fri, Jul 9, 2010 at 10:44 PM, Shuja Rehman wrote:
>
>> Hi All
>>
>> I am facing a hard problem. I am running a map reduce job using streaming
>> but it fails and it gives the following error.
>>
>> Caught: java.lang.OutOfMemoryError: Java heap space
>>at Nodemapper5.parseXML(Nodemapper5.groovy:25)
>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> failed with code 1
>>at
>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>>at
>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>>at
>> org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
>>at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>>at
>> org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
>>at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>>
>> I have increased the heap size in hadoop-env.sh and make it 2000M. Also I
>> tell the job manually by following line.
>>
>> -D mapred.child.java.opts=-Xmx2000M \
>>
>> but it still gives the error. The same job runs fine if i run on shell
>> using
>> 1024M heap size like
>>
>> cat file.xml | /root/Nodemapper5.groovy
>>
>>
>> Any clue?
>>
>> Thanks in advance.
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> _
>> MS CS - School of Science and Engineering
>> Lahore University of Management Sciences (LUMS)
>> Sector U, DHA, Lahore, 54792, Pakistan
>> Cell: +92 3214207445
>>
>
>


java.lang.OutOfMemoryError: Java heap space

2010-07-09 Thread Shuja Rehman
Hi All

I am facing a hard problem. I am running a map reduce job using streaming
but it fails and it gives the following error.

Caught: java.lang.OutOfMemoryError: Java heap space
at Nodemapper5.parseXML(Nodemapper5.groovy:25)
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
failed with code 1
at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)



I have increased the heap size in hadoop-env.sh and make it 2000M. Also I
tell the job manually by following line.

-D mapred.child.java.opts=-Xmx2000M \

but it still gives the error. The same job runs fine if i run on shell using
1024M heap size like

cat file.xml | /root/Nodemapper5.groovy


Any clue?

Thanks in advance.


-- 
Regards
Shuja-ur-Rehman Baig
_
MS CS - School of Science and Engineering
Lahore University of Management Sciences (LUMS)
Sector U, DHA, Lahore, 54792, Pakistan
Cell: +92 3214207445


[jira] Updated: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

2010-07-09 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1730:
--

Attachment: MAPREDUCE-1730.patch

New patch making  sure it compiles with trunk.

> Automate test scenario for successful/killed jobs' memory is properly removed 
> from jobtracker after these jobs retire.
> --
>
> Key: MAPREDUCE-1730
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, 
> MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, 
> TestRetiredJobs-ydist-security-patch.txt, 
> TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch
>
>
> Automate using herriot framework,  test scenario for successful/killed jobs' 
> memory is properly removed from jobtracker after these jobs retire.
> This should test when successful and failed jobs are retired,  their 
> jobInProgress object are removed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1741) Automate the test scenario of job related files are moved from history directory to done directory

2010-07-09 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1741:
--

Attachment: MAPREDUCE-1741.patch

New patch making  sure it compiles with trunk.

> Automate the test scenario of  job related files are moved from history 
> directory to done directory
> ---
>
> Key: MAPREDUCE-1741
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1741
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1741.patch, MAPREDUCE-1741.patch, 
> MAPREDUCE-1741.patch, TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation.patch, TestJobHistoryLocation.patch, 
> TestJobHistoryLocation.patch
>
>
> Job related files are moved from history directory to done directory, when
> 1) Job succeeds
> 2) Job is killed
> 3) When 100 files are put in the done directory
> 4) When multiple jobs are completed at the same time, some successful, some 
> failed.
> Also, two files, conf.xml and job files should be present in the done 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1929) Allow artifacts to be published to the staging Apache Nexus Maven Repository

2010-07-09 Thread Tom White (JIRA)
Allow artifacts to be published to the staging Apache Nexus Maven Repository


 Key: MAPREDUCE-1929
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1929
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Giridharan Kesavan
Priority: Blocker
 Fix For: 0.21.0


MapReduce companion issue to HADOOP-6847.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-09 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886672#action_12886672
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1920:


I was thinking whether we should disable retited jobs cache, if we enable 
completed job store by default, to remove duplicate data storage. But, now i 
feel we can enable both, because retired job cache is served from memory, 
whereas completed job store is served from file system; and clients are served 
from retired jobs cache first, if not found in the cache, then they are served 
from completed job store.

Attached patch looks fine to me.

> Job.getCounters() returns null when using a cluster
> ---
>
> Key: MAPREDUCE-1920
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Aaron Kimball
>Assignee: Tom White
>Priority: Critical
> Attachments: MAPREDUCE-1920.patch
>
>
> Calling Job.getCounters() after the job has completed (successfully) returns 
> null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-09 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886668#action_12886668
 ] 

Tom White commented on MAPREDUCE-1920:
--

Actually, I would do it the other way round. Users expect to be able to get 
counters from jobs they have just run, as witnessed by Aaron's experience that 
led to this bug (also 
http://lucene.472066.n3.nabble.com/Hadoop-0-21-job-getCounters-returns-null-td947190.html).
 I would rather have the default configuration work as expected, and advanced 
users can turn off the job store if they don't want to use it. Does that sound 
reasonable?

> Job.getCounters() returns null when using a cluster
> ---
>
> Key: MAPREDUCE-1920
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Aaron Kimball
>Assignee: Tom White
>Priority: Critical
> Attachments: MAPREDUCE-1920.patch
>
>
> Calling Job.getCounters() after the job has completed (successfully) returns 
> null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-09 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886661#action_12886661
 ] 

Hong Tang commented on MAPREDUCE-1925:
--

However, this should not be a problem because otherwise bis.reset() should 
throw IOException. 

Also, the following line is better to be moved after bis.reset():

{code}
Hadoop20JHParser parser = new Hadoop20JHParser(bis);
{code}

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-09 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886659#action_12886659
 ] 

Hong Tang commented on MAPREDUCE-1925:
--

PossiblyDecompressedInputStream.close() seems to be wrong. it should close 
coreInputStream before returning back the decompressor.

The following is also problematic:
{code}

BufferedInputStream bis = new BufferedInputStream(inputLogStream);
bis.mark(1);
Hadoop20JHParser parser = new Hadoop20JHParser(bis);
{code}

The default buffer size is 8K, so mark a position with a read-ahead limit of 
1 characters could fail if the underlying stream does too much read ahead.

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-09 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

Attachment: MAPREDUCE-1871.patch
1871-ydist-security-patch.txt

Putting  StatisticsCollectionHandler.java under common directory after Cos 
comments.  ./src/test/system/java/shared/org/apache/hadoop/mapred/

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1889) [herriot] Ability to restart a single node for pushconfig

2010-07-09 Thread Balaji Rajagopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Rajagopalan updated MAPREDUCE-1889:
--

Attachment: restartDaemon_y20.patch

Absolutely valid comments, and I have implemented them. The role is a required 
field. Also I have modified the JIRA MAPREDUCE-1854 and provided the new patch 
to use the new methods. Once I get +1 for y20 I will generate patch for trunk.

> [herriot] Ability to restart a single node for pushconfig
> -
>
> Key: MAPREDUCE-1889
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1889
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: restartDaemon.txt, restartDaemon_1.txt, 
> restartDaemon_y20.patch
>
>
> Right now the pushconfig is supported only at a cluster level, this jira will 
> introduce the functionality to be supported at node level. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: Thread profiler output showing contention.jpg

> Reducing locking contention in TaskTracker.MapOutputServlet's 
> LocalDirAllocator
> ---
>
> Key: MAPREDUCE-1904
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
> Attachments: MAPREDUCE-1904-RC10.patch, profiler output after 
> applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread 
> profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads 
> block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
> file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would 
> be available per tasktracker httpserver.  Given the jobid & mapid, 
> LocalDirAllocator retrieves index file path and temporary map output file 
> path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily 
> (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
> LRUCache can be varied based on the environment and I observed a throughput 
> improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1854) [herriot] Automate health script system test

2010-07-09 Thread Balaji Rajagopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Rajagopalan updated MAPREDUCE-1854:
--

Attachment: health_script_y20_1.patch

I have refactored the test cases into 3 since at times as a whole the test case 
timesout, also since the Role was added to dependent JIRA MAPREDUCE-1889, the 
test cases where refactored . I will generated patch for trunk once I get +1 
for y20. 

> [herriot] Automate health script system test
> 
>
> Key: MAPREDUCE-1854
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
> Environment: Herriot framework
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: health_script_5.txt, health_script_7.txt, 
> health_script_trunk.txt, health_script_y20.txt, health_script_y20_1.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> 1. There are three scenarios, first is induce a error from health script, 
> verify that task tracker is blacklisted. 
> 2. Make the health script timeout and verify the task tracker is blacklisted. 
> 3. Make an error in the health script path and make sure the task tracker 
> stays healthy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: profiler output after applying the patch.jpg

Contention on LocalDirAllocator is very less. Close to 0%

> Reducing locking contention in TaskTracker.MapOutputServlet's 
> LocalDirAllocator
> ---
>
> Key: MAPREDUCE-1904
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
> Attachments: MAPREDUCE-1904-RC10.patch, profiler output after 
> applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread 
> profiler output showing contention.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads 
> block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
> file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would 
> be available per tasktracker httpserver.  Given the jobid & mapid, 
> LocalDirAllocator retrieves index file path and temporary map output file 
> path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily 
> (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
> LRUCache can be varied based on the environment and I observed a throughput 
> improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: TaskTracker- yourkit profiler output .jpg

LocalDirAllocator.AllocatorPerContext is heavily contended. 

> Reducing locking contention in TaskTracker.MapOutputServlet's 
> LocalDirAllocator
> ---
>
> Key: MAPREDUCE-1904
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
> Attachments: MAPREDUCE-1904-RC10.patch, TaskTracker- yourkit profiler 
> output .jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads 
> block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
> file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would 
> be available per tasktracker httpserver.  Given the jobid & mapid, 
> LocalDirAllocator retrieves index file path and temporary map output file 
> path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily 
> (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
> LRUCache can be varied based on the environment and I observed a throughput 
> improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-09 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886643#action_12886643
 ] 

Amar Kamat commented on MAPREDUCE-1925:
---

Looked at if briefly and it seems that there is some issue with mark and reset 
on compressed files. {{TestRumenJobTraces.testHadoop20JHParser()}} uses 
{{PossiblyDecompressedInputStream}} for reading the compressed 
_v20-single-input-log.gz_ file. I commented out the code to do with reset and 
the test passed. For now one possible culprit is HADOOP-6835.

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.