[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

2009-09-24 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759030#action_12759030
 ] 

Vinod K V commented on MAPREDUCE-1018:
--

Along with everything else, we should document that job setup and job cleanup 
tasks of all jobs, either requiring or not requiring high memory for their maps 
and reduces, still run on a single slot.

 Document changes to the memory management and scheduling model
 --

 Key: MAPREDUCE-1018
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Priority: Blocker
 Fix For: 0.21.0


 There were changes done for the configuration, monitoring and scheduling of 
 high ram jobs. This must be documented in the mapred-defaults.xml and also on 
 forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1009) Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues

2009-09-24 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V reassigned MAPREDUCE-1009:


Assignee: Vinod K V

 Forrest documentation needs to be updated to describes features provided for 
 supporting hierarchical queues
 ---

 Key: MAPREDUCE-1009
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1009
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Assignee: Vinod K V
Priority: Blocker
 Fix For: 0.21.0


 Forrest documentation must be updated for describing how to set up and use 
 hierarchical queues in the framework and the capacity scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE 777 breaks the UI for hierarchial Queues.

2009-09-24 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated MAPREDUCE-1007:
-

Attachment: MAPREDUCE-1007-1.patch

Uploading patch with test case.

 MAPREDUCE 777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: rahul k singh
Priority: Blocker
 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007.patch


 mapreduce 777 breaks jobtracker UI for hierarchial queues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1000) JobHistory.initDone() should retain the try ... catch in the body

2009-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759057#action_12759057
 ] 

Hadoop QA commented on MAPREDUCE-1000:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420379/mapred-1000-v2.patch
  against trunk revision 818355.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce  new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/128/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/128/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/128/console

This message is automatically generated.

 JobHistory.initDone() should retain the try ... catch in the body
 -

 Key: MAPREDUCE-1000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: Hong Tang
Assignee: Jothi Padmanabhan
 Fix For: 0.21.0

 Attachments: mapred-1000-v2.patch, mapred-1000.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1028) Cleanup tasks are scheduled using high memory configuration, leaving tasks in unassigned state.

2009-09-24 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759058#action_12759058
 ] 

Devaraj Das commented on MAPREDUCE-1028:


After some thought, it seems like decrementing the slot count on a per 
task-used-slot count basis is harmless.. So, for now, let's just ensure that 
all special tasks (job-setup, task-cleanup and job-cleanup) take exactly one 
slot. I couldn't come up with a counter-example where this would lead to 
inconsistencies in the slot counts on the TT, or, would lead to fewer/more 
tasks to be launched than should be as per the slot count and the #slots 
required by tasks scheduled on that TT.

 Cleanup tasks are scheduled using high memory configuration, leaving tasks in 
 unassigned state.
 ---

 Key: MAPREDUCE-1028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Assignee: Ravi Gummadi
Priority: Blocker
 Fix For: 0.21.0


 A cleanup task is launched for a failed task of a job. This task is created 
 based on the TIP of the failed task, and so is marked as requiring as many 
 slots to run as the original task itself. For instance, if a high RAM job 
 requires 2 slots per task, a cleanup task of the high RAM jobs requires 2 
 slots as well.
 Further, a cleanup task is scheduled to a tasktracker by the jobtracker 
 itself and not the scheduler. While doing so, the JT doesn't check if the TT 
 has enough slots free to run a high RAM cleanup task - always assuming 1 slot 
 is enough. Thus, a task is oversubscribed to the TT.
 However, on the TT, before launch, we check that the task can actually run, 
 and wait for so many slots to become available. If the slots don't get freed 
 quickly, we will have tasks stuck in an unassigned state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1033) Resolve location of configuration files after project split

2009-09-24 Thread Vinod K V (JIRA)
Resolve location of configuration files after project split
---

 Key: MAPREDUCE-1033
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1033
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod K V
Priority: Blocker
 Fix For: 0.21.0


At present, all the sub-projects - common, hdfs and mapreduce - have copies of 
all the configuration files. Common configuration files should be left in 
common, mapreduce specific files should be moved to mapreduce project, same 
with hdfs related files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-09-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759063#action_12759063
 ] 

Steve Loughran commented on MAPREDUCE-679:
--

-1 to anything too fancy in the way of content generation
+1 to adding a couple of XSLs at the root of the webapps, so that XML status 
pages can be presented in a human readable form. 

 XML-based metrics as JSP servlet for JobTracker
 ---

 Key: MAPREDUCE-679
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0

 Attachments: example-jobtracker-completed-job.xml, 
 example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
 MAPREDUCE-679.3.patch, MAPREDUCE-679.4.patch, MAPREDUCE-679.5.patch, 
 MAPREDUCE-679.6.patch, MAPREDUCE-679.7.patch, MAPREDUCE-679.patch


 In HADOOP-4559, a general REST API for reporting metrics was proposed but 
 work seems to have stalled. In the interim, we have a simple XML translation 
 of the existing JobTracker status page which provides the same metrics 
 (including the tables of running/completed/failed jobs) as the human-readable 
 page. This is a relatively lightweight addition to provide some 
 machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1035) Remove streaming forrest documentation from the common project

2009-09-24 Thread Vinod K V (JIRA)
Remove streaming forrest documentation from the common project
--

 Key: MAPREDUCE-1035
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1035
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: documentation
Reporter: Vinod K V
Priority: Blocker
 Fix For: 0.21.0


A quick look reveals that the streaming documentation in common already reveals 
that it differs from that in the mapreduce project. We should resolve these 
differences and retain this documentation only in mapreduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE 777 breaks the UI for hierarchial Queues.

2009-09-24 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1007:
-

  Component/s: jobtracker
Affects Version/s: 0.21.0
Fix Version/s: 0.21.0
 Assignee: V.V.Chaitanya Krishna

 MAPREDUCE 777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007.patch


 mapreduce 777 breaks jobtracker UI for hierarchial queues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1007:
-

Description: 

MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When jobtracker.jsp 
is accessed, it throws the following exception:

{code}
java.lang.NullPointerException
at 
org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
at 
org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
at 
org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
at 
org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
at 
org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
 
{code}
(Issue number and the line number in code match - 1007. Some fun for a Hadoop 
developer :) )

  was:mapreduce 777 breaks jobtracker UI for hierarchial queues

Summary: MAPREDUCE-777 breaks the UI for hierarchial Queues.   (was: 
MAPREDUCE 777 breaks the UI for hierarchial Queues. )

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread V.V.Chaitanya Krishna (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759070#action_12759070
 ] 

V.V.Chaitanya Krishna commented on MAPREDUCE-1007:
--

The UI is not being displayed when a hierarchy of queues in built with atleast 
one container queue (i.e., atleast one non-leaf queue). 
There is no check for occurrence of null pointer in 
CapacityTaskScheduler.getJobs(queueName) and when the container queue's name is 
given as input parameter, it fails with NPE.

The above patch is to handle this issue. It also includes test cases written to 
check the behaviour when a job is submitted to a container queue.

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759079#action_12759079
 ] 

Vinod K V commented on MAPREDUCE-1007:
--

Quickly looked at the patch and tested it on a single node. It works fine. Some 
review comments, mostly minor:
 - In CapacityTaskScheduler.getJobs(), we can cache the return value from 
jobQueuesManager.getJobQueue(queueName) to avoid repetitive lookups.
 - Test-cases testSubmitToQueues(), testGetJobs() and the newly added 
testJobsForContainerQueues() share a lot of common stuff and test a single 
concept - job submission to capacity-scheduler. They can be combined into a 
single testJobSubmission(). Further, instead of creating all the internal 
queue-related data-structures ourselves, we can simply create a configuration 
file and start the scheduler. See 
TestRefreshOfQueues.testSuccessfulCapacityRefresh() for an example. You may 
need to do some refactoring to facilitate this.

One orthogonal point which this issue may not concern itself with, 
jobtracker.jsp is now printing the number of root-queues under a header 
'queues'. Just looking at it, it didn't tell me what it actually represents. We 
can (1) rename it to root-queues to be clear, or (2) print both root-queues' 
and job-queues' number or (3) do away with these numbers altogether and just 
give a hyper-link to the queues page. Thoughts?

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1036) An API Specification for Sqoop

2009-09-24 Thread Aaron Kimball (JIRA)
An API Specification for Sqoop
--

 Key: MAPREDUCE-1036
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball


Over the last several months, Sqoop has evolved to a state that is functional 
and has room for extensions. Developing extensions requires a stable API and 
documentation. I am attaching to this ticket a description of Sqoop's design 
and internal APIs, which include some open questions. I would like to solicit 
input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1036) An API Specification for Sqoop

2009-09-24 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1036:
-

Attachment: sqoop-reference.txt

Attaching a draft of the API reference. After the open questions are discussed, 
I will upload a final version of this document formatted as a patch which 
extends the existing user-facing documentation.

 An API Specification for Sqoop
 --

 Key: MAPREDUCE-1036
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: sqoop-reference.txt


 Over the last several months, Sqoop has evolved to a state that is functional 
 and has room for extensions. Developing extensions requires a stable API and 
 documentation. I am attaching to this ticket a description of Sqoop's design 
 and internal APIs, which include some open questions. I would like to solicit 
 input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1037) Failing contrib unit tests should not halt the build

2009-09-24 Thread Chris Douglas (JIRA)
Failing contrib unit tests should not halt the build


 Key: MAPREDUCE-1037
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1037
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, contrib/sqoop
Affects Versions: 0.21.0
Reporter: Chris Douglas
Priority: Blocker
 Fix For: 0.21.0


As in other contrib projects, ( HADOOP-5457 ), failing unit tests in should not 
prevent tests of subsequent modules from running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1037) Failing contrib unit tests should not halt the build

2009-09-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1037:
-

Description: As in other contrib projects, ( HADOOP-5457 ), failing unit 
tests should not prevent tests of subsequent modules from running.  (was: As in 
other contrib projects, ( HADOOP-5457 ), failing unit tests in should not 
prevent tests of subsequent modules from running.)

 Failing contrib unit tests should not halt the build
 

 Key: MAPREDUCE-1037
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1037
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, contrib/sqoop, test
Affects Versions: 0.21.0
Reporter: Chris Douglas
Priority: Blocker
 Fix For: 0.21.0


 As in other contrib projects, ( HADOOP-5457 ), failing unit tests should not 
 prevent tests of subsequent modules from running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1037) Failing contrib unit tests should not halt the build

2009-09-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1037:
-

Component/s: test

 Failing contrib unit tests should not halt the build
 

 Key: MAPREDUCE-1037
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1037
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, contrib/sqoop, test
Affects Versions: 0.21.0
Reporter: Chris Douglas
Priority: Blocker
 Fix For: 0.21.0


 As in other contrib projects, ( HADOOP-5457 ), failing unit tests in should 
 not prevent tests of subsequent modules from running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1029) TestCopyFiles fails on testHftpAccessControl()

2009-09-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759237#action_12759237
 ] 

Chris Douglas commented on MAPREDUCE-1029:
--

This may be because the hdfs webapps aren't on the classpath, so MiniDFSCluster 
isn't starting Jetty, so HftpFileSystem (which uses http) fails to work.

 TestCopyFiles fails on testHftpAccessControl()
 --

 Key: MAPREDUCE-1029
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1029
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Reporter: Amar Kamat

 Log :
 Testcase: testHftpAccessControl took 2.692 sec
 FAILED
 expected:-3 but was:-999
 junit.framework.AssertionFailedError: expected:-3 but was:-999
 at 
 org.apache.hadoop.tools.TestCopyFiles.testHftpAccessControl(TestCopyFiles.java:853)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-728:


Status: Open  (was: Patch Available)

 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-728:


Status: Patch Available  (was: Open)

 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-270) TaskTracker could send an out-of-band heartbeat when the last running map/reduce completes

2009-09-24 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759258#action_12759258
 ] 

Devaraj Das commented on MAPREDUCE-270:
---

+1 core changes look fine.

 TaskTracker could send an out-of-band heartbeat when the last running 
 map/reduce completes
 --

 Key: MAPREDUCE-270
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-270
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-270.patch, MAPREDUCE-270.patch, 
 MAPREDUCE-270_yhadoop20.patch, MAPREDUCE-270_yhadoop20.patch


 Currently the TaskTracker strictly respects the heartbeat interval, this 
 causes utilization issues when all running tasks complete. We could send an 
 out-of-band heartbeat in that case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null

2009-09-24 Thread YongChul Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759300#action_12759300
 ] 

YongChul Kwon commented on MAPREDUCE-118:
-

With the new API, JobID is only available through JobClient instance which is 
protected by Job class. The mapreduce.Job.getJobID() should override 
JobContext.getJobID() to read it form JobClient once it is submitted.

 Job.getJobID() will always return null
 --

 Key: MAPREDUCE-118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-118
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amar Kamat

 JobContext is used for a read-only view of job's info. Hence all the readonly 
 fields in JobContext are set in the constructor. Job extends JobContext. When 
 a Job is created, jobid is not known and hence there is no way to set JobID 
 once Job is created. JobID is obtained only when the JobClient queries the 
 jobTracker for a job-id., which happens later i.e upon job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759320#action_12759320
 ] 

Hadoop QA commented on MAPREDUCE-728:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12420122/mapreduce-728-20090918-6.patch
  against trunk revision 818577.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 30 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/129/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/129/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/129/console

This message is automatically generated.

 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759328#action_12759328
 ] 

Chris Douglas commented on MAPREDUCE-728:
-

The failing core test, TestCopyFiles.testHftpAccessControl also fails on trunk 
( MAPREDUCE-1029 )

 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759332#action_12759332
 ] 

Hong Tang commented on MAPREDUCE-728:
-

The failed test org.apache.hadoop.tools.TestCopyFiles.testHftpAccessControl  
(from TestCopyFiles) is not related.

 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-24 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759337#action_12759337
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Summarizing some offline discussions:
1. Performance issues to do with 1.5 extra round trips to the TaskTracker for 
HTTP Digest authentication could be a significant cost when the map outputs are 
small.
2. Instead of that, can we do the following:
   2.1. Tasks authenticate to the TaskTrackers by simply passing the key in the 
URL. This doesn't cost us anything.
   2.2. Map tasks encrypts the final spill file on the map side when they are 
written to disk (and reducers decrypt them). This could be done using a key 
different from the shuffle key used in 2.1.
The idea is that at some point we anyway should have encrypted map outputs to 
have maximum security for the intermediate outputs. We can do that on-the-wire 
via https, or, have encrypted files. The latter should be much less costly when 
compared with the former. The point of having both 2.1 and 2.2 is to make the 
transfer very secure without introducing overheads to do with extra round trips 
for (digest) authentication.

Thoughts?

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-728:


   Resolution: Fixed
Fix Version/s: (was: 0.22.0)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I committed this to trunk and the 0.21 branch, per the vote on mapreduce-dev.

Thanks to Arun Murthy, Tamas Sarlos, Anirban Dasgupta, Guanying Wang, and Hong 
Tang.

 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-728) Mumak: Map-Reduce Simulator

2009-09-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759346#action_12759346
 ] 

Hudson commented on MAPREDUCE-728:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #64 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/64/])
. Add Mumak, a Hadoop map/reduce simulator. Contributed by Arun C Murthy,
Tamas Sarlos, Anirban Dasgupta, Guanying Wang, and Hong Tang


 Mumak: Map-Reduce Simulator
 ---

 Key: MAPREDUCE-728
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-728
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Hong Tang
 Fix For: 0.21.0

 Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
 mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
 mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
 mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
 mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png


 h3. Vision:
 We want to build a Simulator to simulate large-scale Hadoop clusters, 
 applications and workloads. This would be invaluable in furthering Hadoop by 
 providing a tool for researchers and developers to prototype features (e.g. 
 pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
 their behaviour and performance with reasonable amount of confidence, 
 there-by aiding rapid innovation.
 
 h3. First Cut: Simulator for the Map-Reduce Scheduler
 The Map-Reduce Scheduler is a fertile area of interest with at least four 
 schedulers, each with their own set of features, currently in existence: 
 Default Scheduler, Capacity Scheduler, Fairshare Scheduler  Priority 
 Scheduler.
 Each scheduler's scheduling decisions are driven by many factors, such as 
 fairness, capacity guarantee, resource availability, data-locality etc.
 Given that, it is non-trivial to accurately choose a single scheduler or even 
 a set of desired features to predict the right scheduler (or features) for a 
 given workload. Hence a simulator which can predict how well a particular 
 scheduler works for some specific workload by quickly iterating over 
 schedulers and/or scheduler features would be quite useful.
 So, the first cut is to implement a simulator for the Map-Reduce scheduler 
 which take as input a job trace derived from production workload and a 
 cluster definition, and simulates the execution of the jobs in as defined in 
 the trace in this virtual cluster. As output, the detailed job execution 
 trace (recorded in relation to virtual simulated time) could then be analyzed 
 to understand various traits of individual schedulers (individual jobs turn 
 around time, throughput, faireness, capacity guarantee, etc). To support 
 this, we would need a simulator which could accurately model the conditions 
 of the actual system which would affect a schedulers decisions. These include 
 very large-scale clusters (thousands of nodes), the detailed characteristics 
 of the workload thrown at the clusters, job or task failures, data locality, 
 and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated MAPREDUCE-1007:
-

Attachment: MAPREDUCE-1007-2.txt

Uploading patch with above mentioned comments considered.

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007-2.txt, 
 MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated MAPREDUCE-1007:
-

Status: Patch Available  (was: Open)

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007-2.txt, 
 MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1000) JobHistory.initDone() should retain the try ... catch in the body

2009-09-24 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-1000:
-

Status: Patch Available  (was: Open)

The findbugs warning is spurious. The same patch gave +1 for findbugs in the 
previous run. TestCopyFiles failure is a known failure. Running by Hudson 
again, just in case

 JobHistory.initDone() should retain the try ... catch in the body
 -

 Key: MAPREDUCE-1000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: Hong Tang
Assignee: Jothi Padmanabhan
 Fix For: 0.21.0

 Attachments: mapred-1000-v2.patch, mapred-1000.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1000) JobHistory.initDone() should retain the try ... catch in the body

2009-09-24 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-1000:
-

Status: Open  (was: Patch Available)

 JobHistory.initDone() should retain the try ... catch in the body
 -

 Key: MAPREDUCE-1000
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1000
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: Hong Tang
Assignee: Jothi Padmanabhan
 Fix For: 0.21.0

 Attachments: mapred-1000-v2.patch, mapred-1000.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-964) Inaccurate values in jobSummary logs

2009-09-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759369#action_12759369
 ] 

Hemanth Yamijala commented on MAPREDUCE-964:


I looked at this patch mainly from the point of view of verifying two 
invariants:

- Whenever we set a startTime in a TaskStatus, we need to set the finishTime as 
well.
- A finishTime must be set only if the startTime is  0.

AFAIK, I think the first invariant is ensured from this patch. For the next, I 
tried tracing different code paths, and could see other places where the 
invariant was broken, though the cases identified in the bug seem to have been 
addressed. Rather than adding a fix in every place (which does not guarantee 
that the patch will continue to hold in the face of future changes), I think it 
is sensible to add a check in the setFinishTime and statusUpdate of TaskStatus 
itself, and ensure the invariant holds at the root. Let's log an INFO message 
when we see the invariant is not met to help us debug. Is there a good way to 
get the stack trace as well, which will be more useful ? Maybe 
Thread.currentThread().getStackTrace() will do the trick ?

Also, if we make this change, it would be simple to write a fast unit test that 
ensures the invariant is being satisfied.

Let's run MR-reliability tests on this couple of more times for sanity testing 
it's as good as the original fix.

 Inaccurate values in jobSummary logs
 

 Key: MAPREDUCE-964
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-964
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Rajiv Chittajallu
Assignee: Sreekanth Ramakrishnan
Priority: Critical
 Attachments: mapreduce-964-1.patch


 For some jobs the mapSlotSeconds is incorrect.
 negative value
 09/09/01 18:31:44 INFOmapred.JobInProgress$JobSummary: 
 jobId=job_200908270718_4568,submitTime=1251823543976,launchTime=1251823554310,finishTime=1251829904565,
 
 numMaps=7965,numSlotsPerMap=1,numReduces=40,numSlotsPerReduce=1,user=wile,queue=runner,status=SUCCEEDED,
  
 mapSlotSeconds=-2503133523,reduceSlotsSeconds=186536,clusterMapCapacity=11262,clusterReduceCapacity=3754
 or too high
 09/09/02 23:59:57 INFO mapred.JobInProgress$JobSummary: 
 jobId=job_200908270718_5861,submitTime=1251935672924,launchTime=1251935687698,finishTime=1251935997949,
 
 numMaps=1026,numSlotsPerMap=1,numReduces=10,numSlotsPerReduce=1,user=dfsload,queue=gridops,status=SUCCEEDED,
  
 mapSlotSeconds=1251949742,reduceSlotsSeconds=537,clusterMapCapacity=11262,clusterReduceCapacity=3754

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated MAPREDUCE-1007:
-

Status: Open  (was: Patch Available)

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007-2.txt, 
 MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1038) Mumak's compile-aspects target weaves aspects even though there are no changes to the Mumak's sources

2009-09-24 Thread Vinod K V (JIRA)
Mumak's compile-aspects target weaves aspects even though there are no changes 
to the Mumak's sources
-

 Key: MAPREDUCE-1038
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1038
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Vinod K V
 Fix For: 0.21.0


This is particularly time consuming and is the bottle neck even for a simple 
ant build. In the case where no files have been updated in Mumak, there is no 
reason to recompile sources along with the aspects. compile-aspects should skip 
this test in these cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1038) Mumak's compile-aspects target weaves aspects even though there are no changes to the Mumak's sources

2009-09-24 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1038:
-

Description: This is particularly time consuming and is the bottle neck 
even for a simple ant build. In the case where no files have been updated in 
Mumak, there is no reason to recompile sources along with the aspects. 
compile-aspects should skip this step in these cases.  (was: This is 
particularly time consuming and is the bottle neck even for a simple ant build. 
In the case where no files have been updated in Mumak, there is no reason to 
recompile sources along with the aspects. compile-aspects should skip this test 
in these cases.)

 Mumak's compile-aspects target weaves aspects even though there are no 
 changes to the Mumak's sources
 -

 Key: MAPREDUCE-1038
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1038
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Vinod K V
 Fix For: 0.21.0


 This is particularly time consuming and is the bottle neck even for a simple 
 ant build. In the case where no files have been updated in Mumak, there is no 
 reason to recompile sources along with the aspects. compile-aspects should 
 skip this step in these cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1023) Newly introduced findBugs warnings should be suppressed

2009-09-24 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759381#action_12759381
 ] 

Jothi Padmanabhan commented on MAPREDUCE-1023:
--

There are 11 findbugs warnings (I think introduced by 3 patches, MAPREDUCE-711, 
MAPREDUCE-885 and HADOOP-5661) in the trunk and about 20+ javac warnings 
(warnings other than deprecated ones) in the trunk. 
Managing javac warnings is even more difficult as there are thousands of 
deprecated warnings and any newly added non-deprecated warnings just does not 
get reported in testpatch as we set maxwarns=1000.

 Newly introduced findBugs warnings should be suppressed
 ---

 Key: MAPREDUCE-1023
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1023
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Vinod K V
 Fix For: 0.21.0


 FindBugs warnings introduced by MAPREDUCE-711 and HADOOP-6230 should be 
 suppressed by modifying src/test/findbugsExcludeFile.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1007) MAPREDUCE-777 breaks the UI for hierarchial Queues.

2009-09-24 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated MAPREDUCE-1007:
-

Attachment: MAPREDUCE-1007-3.txt

Had an offline discussion with Vinod with few comments. Uploading new patch 
with these comments implemented.

 MAPREDUCE-777 breaks the UI for hierarchial Queues. 
 

 Key: MAPREDUCE-1007
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1007
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: rahul k singh
Assignee: V.V.Chaitanya Krishna
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1007-1.patch, MAPREDUCE-1007-2.txt, 
 MAPREDUCE-1007-3.txt, MAPREDUCE-1007.patch


 MAPREDUCE-777 breaks jobtracker UI for hierarchial queues. When 
 jobtracker.jsp is accessed, it throws the following exception:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.mapred.CapacityTaskScheduler.getJobs(CapacityTaskScheduler.java:1007)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:3888)
   at 
 org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:3869)
   at 
 org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:3830)
   at 
 org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:36)
  
 {code}
 (Issue number and the line number in code match - 1007. Some fun for a Hadoop 
 developer :) )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.