[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888710#action_12888710
 ] 

Srikanth Sundarrajan commented on MAPREDUCE-1941:
-

{quote}
This can be done in Job client itself, no? History url is already available in 
JobStatus. 
{quote} 

While the history file name may be available through JobStatus, the history 
file is owned by user who runs the job tracker. However access to history file 
should be governed by JobACL.VIEW_JOB. Hence the request to have a separate 
servlet to provide job history file contents.  

> Need a servlet in JobTracker to stream contents of the job history file
> ---
>
> Key: MAPREDUCE-1941
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Affects Versions: 0.22.0
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> There is no convenient mechanism to retrieve the contents of the job history 
> file. Need a way to retrieve the job history file contents from Job Tracker. 
> This can perhaps be implemented as a servlet on the Job tracker.
> * Create a jsp/servlet that accepts job id as a request parameter
> * Stream the contents of the history file corresponding to the job id, if 
> user has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1554) If user name contains '_', then searching of jobs based on user name on job history web UI doesn't work

2010-07-14 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1554:


Description: If user name contains underscore as part of it, then searching 
of jobs based on user name on job history web UI doesn't work. This is because 
in code, everywhere {code}split("_"){code} is done on history file name to get 
user name. And other parts of history file name also should *not* be obtained 
by using split("_").  (was: If user name contains '_', then searching of jobs 
based on user name on job history web UI doesn't work. This is because in code, 
everywhere split("_") is done on history file name to get user name. And other 
parts of history file name also should not be obtained by using split("_").)

> If user name contains '_', then searching of jobs based on user name on job 
> history web UI doesn't work
> ---
>
> Key: MAPREDUCE-1554
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1554
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ravi Gummadi
>
> If user name contains underscore as part of it, then searching of jobs based 
> on user name on job history web UI doesn't work. This is because in code, 
> everywhere {code}split("_"){code} is done on history file name to get user 
> name. And other parts of history file name also should *not* be obtained by 
> using split("_").

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

2010-07-14 Thread Iyappan Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888706#action_12888706
 ] 

Iyappan Srinivasan commented on MAPREDUCE-1730:
---

The two errors are unrelated to the patch. 

> Automate test scenario for successful/killed jobs' memory is properly removed 
> from jobtracker after these jobs retire.
> --
>
> Key: MAPREDUCE-1730
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.21.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, 
> MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, 
> TestRetiredJobs-ydist-security-patch.txt, 
> TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch
>
>
> Automate using herriot framework,  test scenario for successful/killed jobs' 
> memory is properly removed from jobtracker after these jobs retire.
> This should test when successful and failed jobs are retired,  their 
> jobInProgress object are removed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.

2010-07-14 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888704#action_12888704
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1812:
--

I could see 6 failures and they are unrelated to this patch. I don't think so 
the patch could raise these failures because the scope is just adds the new 
properties in a xml file.

> New properties for suspend and resume process.
> --
>
> Key: MAPREDUCE-1812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch
>
>
> Adding new properties in system-test-mr.xml file for suspend and resume 
> process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1621:
---

Status: Open  (was: Patch Available)

Many tests failed because of NoClassDefFoundError. Re-submitting to hudson

> Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
> any output
> -
>
> Key: MAPREDUCE-1621
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: patch-1621.txt
>
>
> If TextOutputReader.readKeyValue() has never successfully read a line, then 
> its bytes member will be left null. Thus when logging a task failure, 
> PipeMapRed.getContext() can trigger an NPE when it calls 
> outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1621:
---

Status: Patch Available  (was: Open)

> Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
> any output
> -
>
> Key: MAPREDUCE-1621
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: patch-1621.txt
>
>
> If TextOutputReader.readKeyValue() has never successfully read a line, then 
> its bytes member will be left null. Thus when logging a task failure, 
> PipeMapRed.getContext() can trigger an NPE when it calls 
> outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888702#action_12888702
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1911:


Test failures are because of MAPREDUCE-1834 and MAPREDUCE-1925

> Fix errors in -info option in streaming
> ---
>
> Key: MAPREDUCE-1911
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1911-1.txt, patch-1911.txt
>
>
> Here are some of the findings by Karam while verifying -info option in 
> streaming:
> # We need to add "Optional" for -mapper, -reducer,-combiner and -file options.
> # For -inputformat and -outputformat options, we should put "Optional" in the 
> prefix for the sake on uniformity.
> # We need to remove -cluster decription.
> # -help option is not displayed in usage message.
> # when displaying message for -info or -help options, we should not display 
> "Streaming Job Failed!"; also exit code should be 0 in case of -help/-info 
> option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888697#action_12888697
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1941:


This can be done in Job client itself, no? History url is already available in 
JobStatus. 

> Need a servlet in JobTracker to stream contents of the job history file
> ---
>
> Key: MAPREDUCE-1941
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: jobtracker
>Affects Versions: 0.22.0
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>
> There is no convenient mechanism to retrieve the contents of the job history 
> file. Need a way to retrieve the job history file contents from Job Tracker. 
> This can perhaps be implemented as a servlet on the Job tracker.
> * Create a jsp/servlet that accepts job id as a request parameter
> * Stream the contents of the history file corresponding to the job id, if 
> user has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-07-14 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888694#action_12888694
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1896:
--

I could see two failures and they are unrelated to this patch. I don't think so 
the patch could raise these failures because the scope is just adds the new 
property in a xml file.

> [Herriot] New property for multi user list.
> ---
>
> Key: MAPREDUCE-1896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
> MAPREDUCE-1896.patch
>
>
> Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888693#action_12888693
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1943:


Limiting task diagnostic info and status are done in MAPREDUCE-1482.

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Attachments: MAPREDUCE-1943-0.20-yahoo.patch
>
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888623#action_12888623
 ] 

Hadoop QA commented on MAPREDUCE-1730:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449081/MAPREDUCE-1730.patch
  against trunk revision 963986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/console

This message is automatically generated.

> Automate test scenario for successful/killed jobs' memory is properly removed 
> from jobtracker after these jobs retire.
> --
>
> Key: MAPREDUCE-1730
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.21.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, 
> MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, 
> TestRetiredJobs-ydist-security-patch.txt, 
> TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch
>
>
> Automate using herriot framework,  test scenario for successful/killed jobs' 
> memory is properly removed from jobtracker after these jobs retire.
> This should test when successful and failed jobs are retired,  their 
> jobInProgress object are removed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

2010-07-14 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Patch Available  (was: Open)

re-submit for hudson.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -
>
> Key: MAPREDUCE-1906
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.2, 0.20.1
>Reporter: Scott Carey
> Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the 
> heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
> 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
> clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
> per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

2010-07-14 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Open  (was: Patch Available)

re-subit for hudson.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -
>
> Key: MAPREDUCE-1906
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.2, 0.20.1
>Reporter: Scott Carey
> Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the 
> heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
> 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
> clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
> per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: (was: MAPREDUCE-1521-0.20-yahoo.patch)

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Attachments: MAPREDUCE-1943-0.20-yahoo.patch
>
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: MAPREDUCE-1943-0.20-yahoo.patch

attached the wrong file.. :)

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Attachments: MAPREDUCE-1943-0.20-yahoo.patch
>
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: MAPREDUCE-1521-0.20-yahoo.patch

this patch imposes some limits.

the following are the limits it imposes:

1) The number of counters per group is limited to 40. If the counters increase 
that amount they are dropped silently.
2) The number of counter groups is restricted to 40. Again if the groups are 
more than the limit they are dropped silently.
3) The string size of counter name is restricted to 64 characters.
4) the string size of group name is restricted to 128 characters.
5) The number of block locations returned by a split is restricted to 100, this 
can be changed with a configuration parameter. 
6) limit the reporter.setstatus() string size to 512 characters.

I havent added tests yet. Will upload one shortly. Also, this patch is for 
yahoo 0.20 branch. I will upload one for the trunk shortly.

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Attachments: MAPREDUCE-1521-0.20-yahoo.patch
>
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics

2010-07-14 Thread Dmytro Molkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888553#action_12888553
 ] 

Dmytro Molkov commented on MAPREDUCE-1848:
--

Patch looks good to me

> Put number of speculative, data local, rack local tasks in JobTracker metrics
> -
>
> Key: MAPREDUCE-1848
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1848-20100614.txt, 
> MAPREDUCE-1848-20100617.txt, MAPREDUCE-1848-20100623.txt
>
>
> It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888557#action_12888557
 ] 

Scott Chen commented on MAPREDUCE-1943:
---

+1 to the idea. We have seen the huge split-size kills JT. This will help.

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888548#action_12888548
 ] 

Eli Collins commented on MAPREDUCE-1942:


+1

>  'compile-fault-inject' should never be called directly.
> 
>
> Key: MAPREDUCE-1942
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
>Priority: Minor
> Attachments: MAPREDUCE-1942.patch
>
>
> Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Fix Version/s: (was: 0.22.0)

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)
Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0


We have come across issues in production clusters wherein users abuse counters, 
statusreport messages and split sizes. One such case was when one of the users 
had 100 million counters. This leads to jobtracker going out of memory and 
being unresponsive. In this jira I am proposing to put sane limits on the 
status report length, the number of counters and the size of block locations 
returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888536#action_12888536
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

This patch basically puts the user in charge of their job. They can leave the 
safety switch set in which case they get the current behavior. But if they turn 
off the safety, their classes go ahead of the ones installed on the cluster. 
That means that they can break things, but all they can break is their own 
tasks.

After we do the split of core from library, you still need this switch. There 
will always be the possibility of needing to patch something in the core, 
because even MapTask has bugs. *smile* After splitting them apart, we can put 
the library code at the very end

safety on:  core, user, library
safety off: user, core, library

This patch is just about providing the safety switch.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888532#action_12888532
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

> Did i understand your concern right?

I don't have specific concerns about this patch.  Sorry for any confusion in 
that regard.  I thought it worthwhile to discuss how this change relates to 
other changes that are contemplated.  It seems not inconsistent, provides some 
of the benefits, and is considerably simpler; in short, a good thing.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1938:
---

Attachment: mr-1938-bp20.1.patch

Addressing Owen's comment on the shell script part of the patch. 

Doug, this patch is a first step towards letting users use their own versions 
of library provided implementation for things like CombineFileInputFormat. The 
use case is to allow for specific implementations of library classes for 
certain classes of jobs. 

This doesn't aim to address the kernel/library separation in its entirety. So 
yes, if the user puts a class on the classpath that doesn't work with the 
kernel compatibly then tasks will fail, or produce obscure/inconsistent 
results, but that will affect only that job, and the user would notice that 
soon (hopefully). Did i understand your concern right?

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888511#action_12888511
 ] 

Hadoop QA commented on MAPREDUCE-1812:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449207/MAPREDUCE-1812.patch
  against trunk revision 963986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/console

This message is automatically generated.

> New properties for suspend and resume process.
> --
>
> Key: MAPREDUCE-1812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch
>
>
> Adding new properties in system-test-mr.xml file for suspend and resume 
> process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.

2010-07-14 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated MAPREDUCE-1733:


Status: Patch Available  (was: Open)

> Authentication between pipes processes and java counterparts.
> -
>
> Key: MAPREDUCE-1733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, 
> MR-1733-y20.3.patch, MR-1733.5.patch
>
>
> The connection between a pipe process and its parent java process should be 
> authenticated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-14 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888503#action_12888503
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1928:
--

to add to #1 - we may be able to change the split size based on the observed 
selectivity of an ongoing job (ie. add splits with larger/smaller size 
depending on stats from the first set of splits). It's possible that Hadoop may 
want to do this as part of the basic framework (by exploiting any mechanisms 
provided here).

This is a huge win for a framework like Hive. It would drastically reduce the 
amount of wasted work (limit N queries) and spawning unnecessarily large number 
of mappers (unknown selectivity) - just to name to obvious use cases. 

Can you supply a more concrete proposal in terms of api changes?

> Dynamic information fed into Hadoop for controlling execution of a submitted 
> job
> 
>
> Key: MAPREDUCE-1928
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, jobtracker, tasktracker
>Affects Versions: 0.20.3
>Reporter: Raman Grover
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> Currently the job submission protocol requires the job provider to put every 
> bit of information inside an instance of JobConf. The submitted information 
> includes the input data (hdfs path) , suspected resource requirement, number 
> of reducers etc.  This information is read by JobTracker as part of job 
> initialization. Once initialized, job is moved into a running state. From 
> this point, there is no mechanism for any additional information to be fed 
> into Hadoop infrastructure for controlling the job execution. 
>The execution pattern for the job looks very much 
> static from this point. Using the size of input data and a few settings 
> inside JobConf, number of mappers is computed. Hadoop attempts at reading the 
> whole of data in parallel by launching parallel map tasks. Once map phase is 
> over, a known number of reduce tasks (supplied as part of  JobConf) are 
> started. 
> Parameters that control the job execution were set in JobConf prior to 
> reading the input data. As the map phase progresses, useful information based 
> upon the content of the input data surfaces and can be used in controlling 
> the further execution of the job. Let us walk through some of the examples 
> where additional information can be fed to Hadoop subsequent to job 
> submission for optimal execution of the job. 
> I) "Process a part of the input , based upon the results decide if reading 
> more input is required " 
> In a huge data set, user is interested in finding 'k' records that 
> satisfy a predicate, essentially sampling the data. In current 
> implementation, as the data is huge, a large no of mappers would be launched 
> consuming a significant fraction of the available map slots in the cluster. 
> Each map task would attempt at emitting a max of  'k' records. With N 
> mappers, we get N*k records out of which one can pick any k to form the final 
> result. 
>This is not optimal as:
>1)  A larger number of map slots get occupied initially, affecting other 
> jobs in the queue. 
>2) If the selectivity of input data is very low, we essentially did not 
> need scanning the whole of data to form our result. 
> we could have finished by reading a fraction of input data, 
> monitoring the cardinality of the map output and determining if 
>more input needs to be processed.  
>
>Optimal way: If reading the whole of input requires N mappers, launch only 
> 'M' initially. Allow them to complete. Based upon the statistics collected, 
> decide additional number of mappers to be launched next and so on until the 
> whole of input has been processed or enough records have been collected to 
> for the results, whichever is earlier. 
>  
>  
> II)  "Here is some data, the remaining is yet to arrive, but you may start 
> with it, and receive more input later"
>  Consider a chain of 2 M-R jobs chained together such that the latter 
> reads the output of the former. The second MR job cannot be started until the 
> first has finished completely. This is essentially because Hadoop needs to be 
> told the complete information about the input before beginning the job. 
> The first M-R has produced enough data ( not finished yet) that can be 
> processed by another MR job and hence the other MR need not wait to grab the 
> whole of input before beginning.  Input splits could be supplied later , but 
> ofcourse before the copy/shuffle phase.
>  
> III)  " Input data has undergone one round of processing by map phase, have 

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888482#action_12888482
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

Owen, I agree with your analysis.  I'm just trying to put this patch in context 
of these other related discussions.

This patch addresses some issues relevant to separation of kernel & library.  
In common cases one can merely provide an alternate version of the library 
class in one's job.  Fully separating kernel & library with a well-defined, 
minimal kernel API is clearly aesthetically better.  Are there use cases that 
will that enable that this patch will not?  I think mostly it will just make it 
clear which classes are safe to replace with updated versions and which are 
not.  Does that sound right?

The issue of user versions of libraries that the kernel uses (like Avro, log4j, 
HttpClient, etc.) is not entirely addressed by this patch.  If the user's 
version is backwards compatible with the kernel's version then this patch is 
sufficient.  But if the user's version of a library makes incompatible changes 
then we'd need a classloader/OSGI solution.  Even then, I think it only works 
if user and kernel code do not interchange instances of classes defined by 
these libraries.  A minimal kernel API will help reduce that risk.  Does this 
analysis sound right?

I'm trying to understand how far this patch gets us towards those goals: what 
it solves and what it doesn't.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Konstantin Boudnik (JIRA)
 'compile-fault-inject' should never be called directly.


 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor


Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1942:
--

Attachment: MAPREDUCE-1942.patch

The fix.

>  'compile-fault-inject' should never be called directly.
> 
>
> Key: MAPREDUCE-1942
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
>Priority: Minor
> Attachments: MAPREDUCE-1942.patch
>
>
> Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888472#action_12888472
 ] 

Konstantin Boudnik commented on MAPREDUCE-1919:
---

I want to disagree with the suggestion on moving this little method to a helper 
class. It doesn't make much sense to create a wrapper around a well know 
ToolRunner interface - it just creates confusion. Why don't you simply use 
{{int exitCode = ToolRunner.run(job, tool, jobArgs)}} ? Why do you need a 
method to wrap a call to another one?

Also, please consider the optimization for the imports list - it is over 
detailed. 

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888468#action_12888468
 ] 

Konstantin Boudnik commented on MAPREDUCE-1933:
---

bq. prop.put("mapred.local.dir", 
"/grid/0/dev/tmp/mapred/mapred-local,/grid/1/dev/tmp/mapred/mapred-local,/grid/2/dev/tmp/mapred/mapred-local,/grid/3/dev/tmp/mapred/mapred-local");

Absolutely, besides this particular parameter should be set by a normal MR 
config already. 

Also, please don't use string literals for configuration parameters. There was 
a significant effort in 0.21 to have all configuration keys refactored to named 
constants. Use them instead.

> Create automated testcase for tasktracker dealing with corrupted disk.
> --
>
> Key: MAPREDUCE-1933
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: TestCorruptedDiskJob.java
>
>
> After the TaskTracker has already run some tasks successfully, "corrupt" a 
> disk by making the corresponding mapred.local.dir unreadable/unwritable. 
> Make sure that jobs continue to succeed even though some tasks scheduled 
> there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888445#action_12888445
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

Doug,

I agree that the kernel code should be split out from libraries, however, that 
work is much more involved. I don't see a problem with putting the user's code 
first. It is not a security concern. The user's code is only run as the user. 
Furthermore, it doesn't actually stop them from loading system classes. They 
can exec a new jvm with a new class path of their own choosing.

Therefore, by putting the user's classes last all that we've done is make it 
harder for the user to implement hot fixes in their own jobs. That doesn't seem 
like a good goal.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888436#action_12888436
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

I think that the default for this should be on.

Rather than add HADOOP_CLIENT_CLASSPATH, let's make a new variable 
HADOOP_USER_CLASSPATH_LAST. If it is defined, we add HADOOP_CLASSPATH to the 
tail like we currently do. Otherwise it is added to the front.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888433#action_12888433
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

Two thoughts:
 1. In general, we need to better separate the kernel from the library.  
CombineFileInputFormat is library code and should be easy to update without 
updating the cluster.  Long-term, only kernel code should be hardwired on the 
classpath of tasks, with library and user code both specified per job.  There 
should be no default version of library classes for a task: tasks should always 
specify their required libraries.  Is there a Jira for this?  I know Tom's 
expressed interest in working on this.
 2. We should permit user code to depend on different versions of things than 
the kernel does.  For example, user code might rely on a different version of 
HttpClient or Avro than that used by MapReduce.  This should be possible if 
instances of classes from these are not a passed between user and kernel code, 
e.g., as long as Avro and HttpClient classes are not a part of the MapReduce 
API.  In this case classloaders (probably via OSGI) could permit this.

> Ability for having user's classes take precedence over the system classes for 
> tasks' classpath
> --
>
> Key: MAPREDUCE-1938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, task, tasktracker
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: mr-1938-bp20.patch
>
>
> It would be nice to have the ability in MapReduce to allow users to specify 
> for their jobs alternate implementations of classes that are already defined 
> in the MapReduce libraries. For example, an alternate implementation for 
> CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888399#action_12888399
 ] 

Hadoop QA commented on MAPREDUCE-1911:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449235/patch-1911-1.txt
  against trunk revision 963986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/console

This message is automatically generated.

> Fix errors in -info option in streaming
> ---
>
> Key: MAPREDUCE-1911
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1911-1.txt, patch-1911.txt
>
>
> Here are some of the findings by Karam while verifying -info option in 
> streaming:
> # We need to add "Optional" for -mapper, -reducer,-combiner and -file options.
> # For -inputformat and -outputformat options, we should put "Optional" in the 
> prefix for the sake on uniformity.
> # We need to remove -cluster decription.
> # -help option is not displayed in usage message.
> # when displaying message for -info or -help options, we should not display 
> "Streaming Job Failed!"; also exit code should be 0 in case of -help/-info 
> option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-07-14 Thread Paul Burkhardt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888358#action_12888358
 ] 

Paul Burkhardt commented on MAPREDUCE-1686:
---

Okay, I'll try and do that.

Paul



> ClassNotFoundException for custom format classes provided in libjars
> 
>
> Key: MAPREDUCE-1686
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.2
>Reporter: Paul Burkhardt
>Priority: Minor
>
> The StreamUtil::goodClassOrNull method assumes user-provided classes have 
> package names and if not, they are part of the Hadoop Streaming package. For 
> example, using custom InputFormat or OutputFormat classes without package 
> names will fail with a ClassNotFound exception which is not indicative given 
> the classes are provided in the libjars option. Admittedly, most Java 
> packages should have a package name so this should rarely come up.
> Possible resolution options:
> 1) modify the error message to include the actual classname that was 
> attempted in the goodClassOrNull method
> 2) call the Configuration::getClassByName method first and if class not found 
> check for default package name and try the call again
> {code}
> public static Class goodClassOrNull(Configuration conf, String className, 
> String defaultPackage) {
> Class clazz = null;
> try {
> clazz = conf.getClassByName(className);
> } catch (ClassNotFoundException cnf) {
> }
> if (clazz == null) {
> if (className.indexOf('.') == -1 && defaultPackage != null) {
> className = defaultPackage + "." + className;
> try {
> clazz = conf.getClassByName(className);
> } catch (ClassNotFoundException cnf) {
> }
> }
> }
> return clazz;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Srikanth Sundarrajan (JIRA)
Need a servlet in JobTracker to stream contents of the job history file
---

 Key: MAPREDUCE-1941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan


There is no convenient mechanism to retrieve the contents of the job history 
file. Need a way to retrieve the job history file contents from Job Tracker. 

This can perhaps be implemented as a servlet on the Job tracker.

* Create a jsp/servlet that accepts job id as a request parameter
* Stream the contents of the history file corresponding to the job id, if user 
has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-14 Thread Steven Lewis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888332#action_12888332
 ] 

Steven Lewis commented on MAPREDUCE-1928:
-

Another possible use has to do with adjusting parameters to avoid failures. I 
have an issue where a reducer is running out of memory. If I was aware that 
certain  keys lead to this failure I could take steps such as sampling data 
rather than processing the whole set do I would add access to data about 
failures

> Dynamic information fed into Hadoop for controlling execution of a submitted 
> job
> 
>
> Key: MAPREDUCE-1928
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, jobtracker, tasktracker
>Affects Versions: 0.20.3
>Reporter: Raman Grover
>   Original Estimate: 2016h
>  Remaining Estimate: 2016h
>
> Currently the job submission protocol requires the job provider to put every 
> bit of information inside an instance of JobConf. The submitted information 
> includes the input data (hdfs path) , suspected resource requirement, number 
> of reducers etc.  This information is read by JobTracker as part of job 
> initialization. Once initialized, job is moved into a running state. From 
> this point, there is no mechanism for any additional information to be fed 
> into Hadoop infrastructure for controlling the job execution. 
>The execution pattern for the job looks very much 
> static from this point. Using the size of input data and a few settings 
> inside JobConf, number of mappers is computed. Hadoop attempts at reading the 
> whole of data in parallel by launching parallel map tasks. Once map phase is 
> over, a known number of reduce tasks (supplied as part of  JobConf) are 
> started. 
> Parameters that control the job execution were set in JobConf prior to 
> reading the input data. As the map phase progresses, useful information based 
> upon the content of the input data surfaces and can be used in controlling 
> the further execution of the job. Let us walk through some of the examples 
> where additional information can be fed to Hadoop subsequent to job 
> submission for optimal execution of the job. 
> I) "Process a part of the input , based upon the results decide if reading 
> more input is required " 
> In a huge data set, user is interested in finding 'k' records that 
> satisfy a predicate, essentially sampling the data. In current 
> implementation, as the data is huge, a large no of mappers would be launched 
> consuming a significant fraction of the available map slots in the cluster. 
> Each map task would attempt at emitting a max of  'k' records. With N 
> mappers, we get N*k records out of which one can pick any k to form the final 
> result. 
>This is not optimal as:
>1)  A larger number of map slots get occupied initially, affecting other 
> jobs in the queue. 
>2) If the selectivity of input data is very low, we essentially did not 
> need scanning the whole of data to form our result. 
> we could have finished by reading a fraction of input data, 
> monitoring the cardinality of the map output and determining if 
>more input needs to be processed.  
>
>Optimal way: If reading the whole of input requires N mappers, launch only 
> 'M' initially. Allow them to complete. Based upon the statistics collected, 
> decide additional number of mappers to be launched next and so on until the 
> whole of input has been processed or enough records have been collected to 
> for the results, whichever is earlier. 
>  
>  
> II)  "Here is some data, the remaining is yet to arrive, but you may start 
> with it, and receive more input later"
>  Consider a chain of 2 M-R jobs chained together such that the latter 
> reads the output of the former. The second MR job cannot be started until the 
> first has finished completely. This is essentially because Hadoop needs to be 
> told the complete information about the input before beginning the job. 
> The first M-R has produced enough data ( not finished yet) that can be 
> processed by another MR job and hence the other MR need not wait to grab the 
> whole of input before beginning.  Input splits could be supplied later , but 
> ofcourse before the copy/shuffle phase.
>  
> III)  " Input data has undergone one round of processing by map phase, have 
> some stats, can now say of the resources 
> required further" 
>Mappers can produce useful stats about of their output, like the 
> cardinality or produce a histogram describing distribution of output . These 
> stats are available to the job provider (Hive/Pig/End User) who can 
>   now determ

[jira] Commented: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888319#action_12888319
 ] 

Hadoop QA commented on MAPREDUCE-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449214/patch-1621.txt
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/console

This message is automatically generated.

> Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
> any output
> -
>
> Key: MAPREDUCE-1621
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: patch-1621.txt
>
>
> If TextOutputReader.readKeyValue() has never successfully read a line, then 
> its bytes member will be left null. Thus when logging a task failure, 
> PipeMapRed.getContext() can trigger an NPE when it calls 
> outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888307#action_12888307
 ] 

Hadoop QA commented on MAPREDUCE-1896:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448436/MAPREDUCE-1896.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/console

This message is automatically generated.

> [Herriot] New property for multi user list.
> ---
>
> Key: MAPREDUCE-1896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
> MAPREDUCE-1896.patch
>
>
> Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1912) [Rumen] Add a driver for Rumen tool

2010-07-14 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888300#action_12888300
 ] 

Ravi Gummadi commented on MAPREDUCE-1912:
-

Some comments:

(1) In build.xml, please change ${common.ivy.lib.dir dir} to 
${common.ivy.lib.dir} directory.

(2) In Folder.java, in initialize() method, printUsage() should be called at 
the 2 places where IllegalArgumentException is thrown(just before throwing).

(3) In Rumen.java, please change "A Rumen tool fold/scale the trace" to "A 
Rumen tool to fold/scale the trace".

(4) In TraceBuilder.java, please reverse the conditions in the following while 
statement so that validation of index is done before accessing the element at 
that index. {code}while (args[switchTop].startsWith("-") && switchTop < 
args.length){code}

(5) As you observed the bug, please make the necessary code change of moving 
"++switchTop;" out of if statement in the above while loop --- to fix the bug 
of the infinite loop when some option that starts with "-"(and is not same as 
-denuxer) is given.

(6) In both places in TraceBuilder.java where printUsage() is called, you are 
checking the case of zero more arguments only. We need to make sure that there
are at least 3 arguments in both places.
So change (a) "if (0 == args.length)" to "if (args.length < 3)" and (b) "if 
(switchTop == args.length)" to "if (switchTop+2 >= args.length)".

> [Rumen] Add a driver for Rumen tool 
> 
>
> Key: MAPREDUCE-1912
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1912
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.22.0
>
> Attachments: mapreduce-1912-v1.1.patch
>
>
> Rumen, as a tool, has 2 entry points :
> - Trace builder
> - Folder
> It would be nice to have a single driver program and have 'trace-builder' and 
> 'folder' as its options. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1940) [Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files

2010-07-14 Thread Amar Kamat (JIRA)
[Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and 
output files


 Key: MAPREDUCE-1940
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1940
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Amar Kamat


Currently Folder and TraceBuilder expect the input and output to be the last 
arguments in the command line. It would be better to add special switches to 
the input and output files to avoid confusion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1526.
--

 Hadoop Flags: [Reviewed]
 Assignee: rahul k singh
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

> Cache the job related information while submitting the job , this would avoid 
> many RPC calls to JobTracker.
> ---
>
> Key: MAPREDUCE-1526
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: rahul k singh
>Assignee: rahul k singh
> Fix For: 0.22.0
>
> Attachments: 1526-yahadoop-20-101-2.patch, 
> 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 
> 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1376) Support for varied user submission in Gridmix

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1376.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

> Support for varied user submission in Gridmix
> -
>
> Key: MAPREDUCE-1376
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1376
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Fix For: 0.22.0
>
> Attachments: 1376-2-yhadoop-security.patch, 
> 1376-3-yhadoop20.100.patch, 1376-4-yhadoop20.100.patch, 
> 1376-5-yhadoop20-100.patch, 1376-yhadoop-security.patch, M1376-0.patch, 
> M1376-1.patch, M1376-2.patch, M1376-3.patch, M1376-4.patch
>
>
> Gridmix currently submits all synthetic jobs as the client user. It should be 
> possible to map users in the trace to a set of users appropriate for the 
> target cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1711) Gridmix should provide an option to submit jobs to the same queues as specified in the trace.

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1711.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

> Gridmix should provide an option to submit jobs to the same queues as 
> specified in the trace.
> -
>
> Key: MAPREDUCE-1711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1711
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: Hong Tang
>Assignee: rahul k singh
> Fix For: 0.22.0
>
> Attachments: diff-gridmix.patch, diff-rumen.patch, 
> MR-1711-yhadoop-20-1xx-2.patch, MR-1711-yhadoop-20-1xx-3.patch, 
> MR-1711-yhadoop-20-1xx-4.patch, MR-1711-yhadoop-20-1xx-5.patch, 
> MR-1711-yhadoop-20-1xx-6.patch, MR-1711-yhadoop-20-1xx-7.patch, 
> MR-1711-yhadoop-20-1xx.patch, MR-1711-Yhadoop-20-crossPort-1.patch, 
> MR-1711-Yhadoop-20-crossPort-2.patch, MR-1711-Yhadoop-20-crossPort.patch, 
> mr-1711-yhadoop-20.1xx-20100416.patch
>
>
> Gridmix should provide an option to submit jobs to the same queues as 
> specified in the trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1594.
--

 Hadoop Flags: [Reviewed]
 Assignee: rahul k singh
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

> Support for Sleep Jobs in gridmix
> -
>
> Key: MAPREDUCE-1594
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Reporter: rahul k singh
>Assignee: rahul k singh
> Fix For: 0.22.0
>
> Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 
> 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 
> 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 
> 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch
>
>
> Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1840) [Gridmix] Exploit/Add security features in GridMix

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1840:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I committed this.

Thanks to Amar, Rahul, and Hong

> [Gridmix] Exploit/Add security features in GridMix
> --
>
> Key: MAPREDUCE-1840
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1840
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Affects Versions: 0.22.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.22.0
>
> Attachments: mapreduce-gridmix-fp-v1.3.3.patch, 
> mapreduce-gridmix-fp-v1.3.9.patch
>
>
> Use security information while replaying jobs in Gridmix. This includes
> - Support for multiple users
> - Submitting jobs as different users
> - Allowing usage of secure cluster (hdfs + mapreduce)
> - Support for multiple queues
> Other features include : 
> - Support for sleep job
> - Support for load job 
> + testcases for verifying all of the above changes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1925:


Attachment: 1925.v1.patch

Attaching new patch incorporating review comments.

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888257#action_12888257
 ] 

Ravi Gummadi commented on MAPREDUCE-1925:
-

Thanks Hong.
Will upload new patch which removes that .gz file and the testcase itself 
contains the expected list of events as array of Strings.

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888249#action_12888249
 ] 

Hong Tang commented on MAPREDUCE-1925:
--

Git diff --text will add binary diff to the patch.





> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888247#action_12888247
 ] 

Hadoop QA commented on MAPREDUCE-1710:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449101/MAPREDUCE-1710.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/console

This message is automatically generated.

> Process tree clean up of exceeding memory limit tasks.
> --
>
> Key: MAPREDUCE-1710
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, 
> 1710-ydist_security.patch, MAPREDUCE-1710.patch, memorylimittask_1710.patch, 
> memorylimittask_1710.patch, memorylimittask_1710.patch, 
> memorylimittask_1710.patch, memorylimittask_1710.patch
>
>
> 1. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Let the job complete . Check if all the 
> child processes are killed, the overall job should fail.
> 2. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Kill/fail the job while in progress. 
> Check if all the child processes are killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1865:


Status: Patch Available  (was: Open)

> [Rumen] Rumen should also support jobhistory files generated using trunk
> 
>
> Key: MAPREDUCE-1865
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.22.0
>
> Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, 
> mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch
>
>
> Rumen code in trunk parses and process only jobhistory files from pre-21 
> hadoop mapreduce clusters. It should also support jobhistory files generated 
> using trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888243#action_12888243
 ] 

Hadoop QA commented on MAPREDUCE-1713:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449108/MAPREDUCE-1713.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/console

This message is automatically generated.

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
> MAPREDUCE-1713.patch, systemtestutils_MR1713.patch, 
> utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1878) Add MRUnit documentation

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888238#action_12888238
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1878:


I think the document can be added as package.html in mrunit package instead of 
.txt file, similar to all other packages. 

> Add MRUnit documentation
> 
>
> Key: MAPREDUCE-1878
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1878
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/mrunit
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1878.2.patch, MAPREDUCE-1878.patch
>
>
> A short user guide for MRUnit, written in asciidoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk

2010-07-14 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1865:
--

Attachment: mapreduce-1865-v1.7.1.patch

Attaching a slightly modified patch with changes to comments and assert 
messages.

> [Rumen] Rumen should also support jobhistory files generated using trunk
> 
>
> Key: MAPREDUCE-1865
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.22.0
>
> Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, 
> mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch
>
>
> Rumen code in trunk parses and process only jobhistory files from pre-21 
> hadoop mapreduce clusters. It should also support jobhistory files generated 
> using trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.