[jira] Commented: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup

2010-05-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869490#action_12869490
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1802:
--

thanks. 463 it is.

> allow outputcommitters to skip setup/cleanup
> 
>
> Key: MAPREDUCE-1802
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>
> Job setup and cleanup overheads in our (larger) clusters are very significant 
> and add to latency for small jobs. It turns out that Hive does not require 
> job setup and cleanup at all - since all management of output/temporary files 
> and such is done by the hive client side. So it would be a big win for our 
> environment (and Hive users in general) if we could skip job cleanup/setup 
> altogether.
> The proposal is to add new calls to OutputCommitter interface (along the 
> lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and 
> for the JT to take these into account while scheduling setup/cleanup. 
> NullOutputFormat should not need setup/cleanup for example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup

2010-05-19 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma resolved MAPREDUCE-1802.
--

Resolution: Duplicate

> allow outputcommitters to skip setup/cleanup
> 
>
> Key: MAPREDUCE-1802
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>
> Job setup and cleanup overheads in our (larger) clusters are very significant 
> and add to latency for small jobs. It turns out that Hive does not require 
> job setup and cleanup at all - since all management of output/temporary files 
> and such is done by the hive client side. So it would be a big win for our 
> environment (and Hive users in general) if we could skip job cleanup/setup 
> altogether.
> The proposal is to add new calls to OutputCommitter interface (along the 
> lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and 
> for the JT to take these into account while scheduling setup/cleanup. 
> NullOutputFormat should not need setup/cleanup for example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-05-19 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1623:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I just committed this. Thanks Tom, this was a big one!

> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1623-1.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1713) Utilities for system tests specific.

2010-05-19 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1713:
-

Attachment: MAPREDUCE-1713.patch

Latest patch based on cos comments.

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
> systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-05-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869480#action_12869480
 ] 

Hadoop QA commented on MAPREDUCE-1623:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12445008/MAPREDUCE-1623.patch
  against trunk revision 944427.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/console

This message is automatically generated.

> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1623-1.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-05-19 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869474#action_12869474
 ] 

Tom White commented on MAPREDUCE-1623:
--

+1

Thanks Arun!

> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1623-1.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1151) Cleanup and Setup jobs should only call cleanupJob() and setupJob() methods of the OutputCommitter

2010-05-19 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-1151.


Resolution: Duplicate

Fixed by MAPREDUCE-1476

> Cleanup and Setup jobs should only call cleanupJob() and setupJob() methods 
> of the OutputCommitter
> --
>
> Key: MAPREDUCE-1151
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1151
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Pradeep Kamath
>
> The cleanup and setup jobs run as map jobs and call setUpTask() , 
> needsTaskCommit() and possibly commitTask() and abortTask() methods of the 
> OutputCommitter. They should only be calling the cleanupJob() and setupJob() 
> methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup

2010-05-19 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869465#action_12869465
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1802:


Is it same as MAPREDUCE-463? 
MAPREDUCE-463 adds a configuration "mapred.committer.job.setup.cleanup.needed" 
to know whether job needs a job-setup and job-cleanup.

> allow outputcommitters to skip setup/cleanup
> 
>
> Key: MAPREDUCE-1802
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
>
> Job setup and cleanup overheads in our (larger) clusters are very significant 
> and add to latency for small jobs. It turns out that Hive does not require 
> job setup and cleanup at all - since all management of output/temporary files 
> and such is done by the hive client side. So it would be a big win for our 
> environment (and Hive users in general) if we could skip job cleanup/setup 
> altogether.
> The proposal is to add new calls to OutputCommitter interface (along the 
> lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and 
> for the JT to take these into account while scheduling setup/cleanup. 
> NullOutputFormat should not need setup/cleanup for example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1803) 0.21 nightly snapshot build has dependency on 0.22 snapshot

2010-05-19 Thread Aaron Kimball (JIRA)
0.21 nightly snapshot build has dependency on 0.22 snapshot
---

 Key: MAPREDUCE-1803
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1803
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Aaron Kimball


The POM generated in 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-mapred/0.21.0-SNAPSHOT/
 has a reference to hadoop-core 0.22.0-SNAPSHOT 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1545) Add 'first-task-launched' to job-summary

2010-05-19 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869442#action_12869442
 ] 

Luke Lu commented on MAPREDUCE-1545:


@ciemo, you can find start and finish times of *every* task in job history. The 
first task launch times are for the job *summary* only.

> Add 'first-task-launched' to job-summary
> 
>
> Key: MAPREDUCE-1545
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1545
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Arun C Murthy
>Assignee: Luke Lu
> Fix For: 0.22.0
>
> Attachments: mr-1545-trunk-v1.patch, mr-1545-trunk-v2.patch, 
> mr-1545-y20s-v1.patch, mr-1545-y20s-v2.patch, mr-1545-y20s-v3.patch
>
>
> It would be useful to track 'first-task-launched' time to job-summary for 
> better reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup

2010-05-19 Thread Joydeep Sen Sarma (JIRA)
allow outputcommitters to skip setup/cleanup


 Key: MAPREDUCE-1802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma


Job setup and cleanup overheads in our (larger) clusters are very significant 
and add to latency for small jobs. It turns out that Hive does not require job 
setup and cleanup at all - since all management of output/temporary files and 
such is done by the hive client side. So it would be a big win for our 
environment (and Hive users in general) if we could skip job cleanup/setup 
altogether.

The proposal is to add new calls to OutputCommitter interface (along the lines 
of needsTaskCommit()) to optionally allow skipping of setup/cleanup and for the 
JT to take these into account while scheduling setup/cleanup. NullOutputFormat 
should not need setup/cleanup for example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-05-19 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1623:
-

Status: Patch Available  (was: Open)

> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1623-1.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-05-19 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1623:
-

Attachment: MAPREDUCE-1623.patch

Updated patch since the previous one didn't apply clean, I've incorporated my 
own (final) comments. 

Tom, if you are fine with the proposed changes I'll go ahead and commit. 

> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1623-1.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-05-19 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1623:
-

Status: Open  (was: Patch Available)

Final comments:

src/java/org/apache/hadoop/mapreduce/lib/jobcontrol/ControlledJob.java
src/java/org/apache/hadoop/mapreduce/lib/jobcontrol/JobControl.java 
Both should be Public, Evolving - I don't think they are ready to be labelled 
'stable' yet.


src/java/org/apache/hadoop/mapreduce/QueueInfo.java -> Evolving

src/java/org/apache/hadoop/mapred/IsolationRunner.java -> Evolving since I'm 
not sure IsolationRunner even works anymore.



> Apply audience and stability annotations to classes in mapred package
> -
>
> Key: MAPREDUCE-1623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: M1623-1.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
> MAPREDUCE-1623.patch, MAPREDUCE-1623.patch
>
>
> There are lots of implementation classes in org.apache.hadoop.mapred which 
> makes it difficult to see the user-level MapReduce API classes in the 
> Javadoc. (See 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
>  for example.) By marking these implementation classes with the 
> InterfaceAudience.Private annotation we can exclude them from user Javadoc 
> (using HADOOP-6658).
> Later work will move the implementation classes into o.a.h.mapreduce.server 
> and related packages (see MAPREDUCE-561), but applying the annotations is a 
> good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator

2010-05-19 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869425#action_12869425
 ] 

Doug Cutting commented on MAPREDUCE-1126:
-

If we elect to abandon MAPREDUCE-815 in favor of AVRO-493, and since all of the 
underpinnings of this issue have been reverted, perhaps we should now close 
this as "won't fix"?

> shuffle should use serialization to get comparator
> --
>
> Key: MAPREDUCE-1126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Doug Cutting
>Assignee: Aaron Kimball
> Fix For: 0.22.0
>
> Attachments: m-1126-2.patch, m-1126-3.patch, MAPREDUCE-1126.2.patch, 
> MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, 
> MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch, MAPREDUCE-1126.patch
>
>
> Currently the key comparator is defined as a Java class.  Instead we should 
> use the Serialization API to create key comparators.  This would permit, 
> e.g., Avro-based comparators to be used, permitting efficient sorting of 
> complex data types without having to write a RawComparator in Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives

2010-05-19 Thread Al Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Al Thompson updated MAPREDUCE-1641:
---

Attachment: mapreduce-1641--2010-05-19.patch

Minor edits made to the patch in an effort to improve readability.

> Job submission should fail if same uri is added for mapred.cache.files and 
> mapred.cache.archives
> 
>
> Key: MAPREDUCE-1641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distributed-cache
>Reporter: Amareshwari Sriramadasu
>Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, 
> duped-files-archives--off-0-20-101--2010-04-21.patch, 
> duped-files-archives--off-0-20-101--2010-04-23--1819.patch, 
> mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, 
> patch-1641-ydist-bugfix.txt
>
>
> The behavior of mapred.cache.files and mapred.cache.archives is different 
> during localization in the following way:
> If a jar file is added to mapred.cache.files,  it will be localized under 
> TaskTracker under a unique path. 
> If a jar file is added to mapred.cache.archives, it will be localized under a 
> unique path in a directory named the jar file name, and will be unarchived 
> under the same directory.
> If same jar file is passed for both the configurations, the behavior 
> undefined. Thus the job submission should fail.
> Currently, since distributed cache processes files before archives, the jar 
> file will be just localized and not unarchived.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1801) do not throw exception if cannot get a delegation token, it may be from a unsecured cluster (part of HDFS-1044)

2010-05-19 Thread Boris Shkolnik (JIRA)
do not throw exception if cannot get a delegation token, it may be from a 
unsecured cluster (part of HDFS-1044)
---

 Key: MAPREDUCE-1801
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1801
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1798) normalize property names for JT kerberos principal names in configuration (from HADOOP 6633)

2010-05-19 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869380#action_12869380
 ] 

Jitendra Nath Pandey commented on MAPREDUCE-1798:
-

+1

> normalize property names for JT kerberos principal names in configuration 
> (from HADOOP 6633)
> 
>
> Key: MAPREDUCE-1798
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1798
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1798.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed

2010-05-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869373#action_12869373
 ] 

Hadoop QA commented on MAPREDUCE-1505:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12444965/mapreduce-1505--2010-05-19.patch
  against trunk revision 944427.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/console

This message is automatically generated.

> Cluster class should create the rpc client only when needed
> ---
>
> Key: MAPREDUCE-1505
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.2
>Reporter: Devaraj Das
>Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: mapreduce-1505--2010-05-19.patch, 
> MAPREDUCE-1505_yhadoop20.patch, MAPREDUCE-1505_yhadoop20_9.patch
>
>
> It will be good to have the org.apache.hadoop.mapreduce.Cluster create the 
> rpc client object only when needed (when a call to the jobtracker is actually 
> required). org.apache.hadoop.mapreduce.Job constructs the Cluster object 
> internally and in many cases the application that created the Job object 
> really wants to look at the configuration only. It'd help to not have these 
> connections to the jobtracker especially when Job is used in the tasks (for 
> e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that 
> requires a Job object to be passed).
> In Hadoop 20, the Job object internally creates the JobClient object, and the 
> same argument applies there too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1753) Implement a functionality for suspend and resume a task's process.

2010-05-19 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869363#action_12869363
 ] 

Konstantin Boudnik commented on MAPREDUCE-1753:
---

It isn't about my satisfaction. It's two bits return value which is done as 
boolean type. C language doesn't have it that's why they use integer instead.

Now, about suspend resume process: you are right, it is generic. Which 
technically allow to suspend a daemon VM's process and never be able to resume 
it. But It seems to be Ok, I guess. I was totally confused by the fact that 
this has been tracked by a MAPREDUCE JIRA :( I'm moving this ticket out to 
HADOOP.

> Implement a functionality for suspend and resume a task's process.
> --
>
> Key: MAPREDUCE-1753
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1753
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1753-ydist-security.patch, 1753-ydist-security.patch, 
> 1753-ydist-security.patch, daemonprotocolaspect.patch
>
>
> Adding  two methods in DaemonProtocolAspect.aj for suspend and resume the 
> process.
> public int DaemonProtocol.resumeProcess(String pid) throws IOException;
> public int DaemonProtocol.suspendProcess(String pid) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic

2010-05-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869360#action_12869360
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1800:
--

the problem is that the current heuristics also cause bad behavior when 
uplinks/core-switches degrade.

i agree that the case of a single node that is not able to send map outputs is 
something that hadoop should detect/correct automatically - but i don't think 
the current heuristic (by itself) is a good one because of the previous point. 

i don't have a good alternative solution/proposals. a few thoughts pop to mind:
- separate blacklisting of TTs due to map/reduce task failures from 
blacklisting due to map-output fetch failures. the thresholds and policies 
required seem different.
- if the scope of the fault is nic/port/process/os problems affecting a 
'single' node - then we should only take into map-fetch failures that happen 
within the same rack. (ie. assign blame to a TT only if other TTs within the 
same rack cannot communicate to it)
- blame should be laid by a multitude of different hosts. It's no good if 4 
reducers on TT1 cannot get map outputs from TT2 and this results in 
blacklisting of TT2. It's possible that TT1 itself has a bad port/nic. 

(just thinking aloud, i don't have a careful understanding of the code beyond 
what's been relayed to me by others :-)).

> using map output fetch failures to blacklist nodes is problematic
> -
>
> Key: MAPREDUCE-1800
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> If a mapper and a reducer cannot communicate, then either party could be at 
> fault. The current hadoop protocol allows reducers to declare nodes running 
> the mapper as being at fault. When sufficient number of reducers do so - then 
> the map node can be blacklisted. 
> In cases where networking problems cause substantial degradation in 
> communication across sets of nodes - then large number of nodes can become 
> blacklisted as a result of this protocol. The blacklisting is often wrong 
> (reducers on the smaller side of the network partition can collectively cause 
> nodes on the larger network partitioned to be blacklisted) and 
> counterproductive (rerunning maps puts further load on the (already) maxed 
> out network links).
> We should revisit how we can better identify nodes with genuine network 
> problems (and what role, if any, map-output fetch failures have in this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic

2010-05-19 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869340#action_12869340
 ] 

Arun C Murthy commented on MAPREDUCE-1800:
--

FWIW the current heuristics protect reduces against a common case of a single 
node (on which the map ran), and works reasonably well.

What I'm reading here is that we need better overall metrics/monitoring of the 
cluster and enhancements to the masters (JobTracker/NameNode) to take advantage 
of the metrics/monitoring stats. Is that reasonable?

> using map output fetch failures to blacklist nodes is problematic
> -
>
> Key: MAPREDUCE-1800
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> If a mapper and a reducer cannot communicate, then either party could be at 
> fault. The current hadoop protocol allows reducers to declare nodes running 
> the mapper as being at fault. When sufficient number of reducers do so - then 
> the map node can be blacklisted. 
> In cases where networking problems cause substantial degradation in 
> communication across sets of nodes - then large number of nodes can become 
> blacklisted as a result of this protocol. The blacklisting is often wrong 
> (reducers on the smaller side of the network partition can collectively cause 
> nodes on the larger network partitioned to be blacklisted) and 
> counterproductive (rerunning maps puts further load on the (already) maxed 
> out network links).
> We should revisit how we can better identify nodes with genuine network 
> problems (and what role, if any, map-output fetch failures have in this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability

2010-05-19 Thread Dmytro Molkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869333#action_12869333
 ] 

Dmytro Molkov commented on MAPREDUCE-1354:
--

Is there any particular reason that only getTaskCompletionEvents dropped the 
synchronized modifier, but all other job access methods like 
getCleanupTaskReports, getSetupTaskReports, etc are still syncrhonized, while 
effectively they are doing a very similar kind of access?

> Incremental enhancements to the JobTracker for better scalability
> -
>
> Key: MAPREDUCE-1354
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Devaraj Das
>Assignee: Dick King
>Priority: Critical
> Attachments: mapreduce-1354--2010-03-10.patch, 
> mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> mr-1354-y20.patch
>
>
> It'd be nice to have the JobTracker object not be locked while accessing the 
> HDFS for reading the jobconf file and while writing the jobinfo file in the 
> submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1762) Add a setValue() method in Counter

2010-05-19 Thread Scott Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-1762:
--

Attachment: MAPREDUCE-1762.1.txt

Fixed a typo in the patch.

> Add a setValue() method in Counter
> --
>
> Key: MAPREDUCE-1762
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1762
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1762.1.txt, MAPREDUCE-1762.txt
>
>
> Counters are very useful because of the logging and transmitting are already 
> there.
> It is very convenient to transmit and store numbers. But currently Counter 
> only has an increment() method.
> It will be nice if there can be a setValue() method in this class that will 
> allow us to transmit wider variety of information through it.
> What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1762) Add a setValue() method in Counter

2010-05-19 Thread Dmytro Molkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869304#action_12869304
 ] 

Dmytro Molkov commented on MAPREDUCE-1762:
--

The code looks good

> Add a setValue() method in Counter
> --
>
> Key: MAPREDUCE-1762
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1762
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1762.txt
>
>
> Counters are very useful because of the logging and transmitting are already 
> there.
> It is very convenient to transmit and store numbers. But currently Counter 
> only has an increment() method.
> It will be nice if there can be a setValue() method in this class that will 
> allow us to transmit wider variety of information through it.
> What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic

2010-05-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869266#action_12869266
 ] 

Todd Lipcon commented on MAPREDUCE-1800:


Hey Joydeep. Thanks for the further explanation - I agree we could do better 
here. There's an old JIRA where we threw around some ideas similar to this 
maybe last August or so, but can't seem to find it at the moment. Anyone 
remember the one I mean?

> using map output fetch failures to blacklist nodes is problematic
> -
>
> Key: MAPREDUCE-1800
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> If a mapper and a reducer cannot communicate, then either party could be at 
> fault. The current hadoop protocol allows reducers to declare nodes running 
> the mapper as being at fault. When sufficient number of reducers do so - then 
> the map node can be blacklisted. 
> In cases where networking problems cause substantial degradation in 
> communication across sets of nodes - then large number of nodes can become 
> blacklisted as a result of this protocol. The blacklisting is often wrong 
> (reducers on the smaller side of the network partition can collectively cause 
> nodes on the larger network partitioned to be blacklisted) and 
> counterproductive (rerunning maps puts further load on the (already) maxed 
> out network links).
> We should revisit how we can better identify nodes with genuine network 
> problems (and what role, if any, map-output fetch failures have in this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability

2010-05-19 Thread Dick King (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869259#action_12869259
 ] 

Dick King commented on MAPREDUCE-1354:
--

The regression failure flagged by Hudson, {{TestJobStatusPersistency}} , does 
not repeat, and is hugely unlikely to have been caused by this patch.

There is no new test because this patch fixes an extremely narrow race 
condition and that race cannot be induced artificially.

> Incremental enhancements to the JobTracker for better scalability
> -
>
> Key: MAPREDUCE-1354
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Devaraj Das
>Assignee: Dick King
>Priority: Critical
> Attachments: mapreduce-1354--2010-03-10.patch, 
> mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> mr-1354-y20.patch
>
>
> It'd be nice to have the JobTracker object not be locked while accessing the 
> HDFS for reading the jobconf file and while writing the jobinfo file in the 
> submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic

2010-05-19 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869258#action_12869258
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1800:
--

if there is a total network partition - then we don't have a problem. either 
the cluster will fail outright (let's say JT and NN land up on different sides 
of the partition) - or one partition (the one that has the JT/NN) will exclude 
nodes from the other. (i say we don't have a problem in the sense that the 
response of hadoop to such an event is more or less correct).

The problem is that we have had occurences of slow networks that are not quite 
partitioned. For example the uplink from one rack switch to the core switch can 
be flaky/degraded. in this case - control traffic from the JT to the TTs may be 
going through - but data traffic from mappers and reducers on the degraded 
racks can be really hurt. If there are problems in the core switch itself (it's 
underprovisioned) - then the whole cluster is having network problems. The 
description applies to such scenarios.

In such a case - the appropriate response of the software should be, at worst,  
degraded performance (in keeping with the degraded nature of the underlying 
hardware) or at best, correctly identifying the the slow node(s) and not using 
them or using them less (this would apply to the flaky rack uplink scenario). 
The current response of Hadoop is neither. It makes a bad situation worse by 
misassigning blame (when map nodes on good racks are blamed by sufficiently 
large number of reducers running on bad racks). We potentially lose nodes from 
good racks and the resultant retry of tasks puts further stress on the strained 
network resource.

A couple of things seem desirable:
1. for enterprise data center environments that (may) have high degree of 
control and monitoring around their networking elements - the ability to turn 
off (selectively) the 
functionality in hadoop that tries to detect and correct for network problems. 
Diagnostics stands a much better chance to catch/identify networking problems 
and fix them.
2. in environments with less control (say Amazon EC2 or hadoop running on a 
bunch of PCs across a company) that are more akin to a p2p network - hadoop's 
network fault diagnosis algorithms need improvement. A comparison to bittorrent 
is fair - over there every node advertises it's upload/download throughput and 
a node can come across as slow only in comparison to the collective stats 
published by all peers (and not just based on communication with a small set of 
other peers).



> using map output fetch failures to blacklist nodes is problematic
> -
>
> Key: MAPREDUCE-1800
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> If a mapper and a reducer cannot communicate, then either party could be at 
> fault. The current hadoop protocol allows reducers to declare nodes running 
> the mapper as being at fault. When sufficient number of reducers do so - then 
> the map node can be blacklisted. 
> In cases where networking problems cause substantial degradation in 
> communication across sets of nodes - then large number of nodes can become 
> blacklisted as a result of this protocol. The blacklisting is often wrong 
> (reducers on the smaller side of the network partition can collectively cause 
> nodes on the larger network partitioned to be blacklisted) and 
> counterproductive (rerunning maps puts further load on the (already) maxed 
> out network links).
> We should revisit how we can better identify nodes with genuine network 
> problems (and what role, if any, map-output fetch failures have in this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic

2010-05-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869247#action_12869247
 ] 

Todd Lipcon commented on MAPREDUCE-1800:


Hey Joydeep. Do you often have cases where sets of TT nodes can't talk to each 
other but both sides can still talk to the JT? This is interesting, as it seems 
like an unusual network architecture.

> using map output fetch failures to blacklist nodes is problematic
> -
>
> Key: MAPREDUCE-1800
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Joydeep Sen Sarma
>
> If a mapper and a reducer cannot communicate, then either party could be at 
> fault. The current hadoop protocol allows reducers to declare nodes running 
> the mapper as being at fault. When sufficient number of reducers do so - then 
> the map node can be blacklisted. 
> In cases where networking problems cause substantial degradation in 
> communication across sets of nodes - then large number of nodes can become 
> blacklisted as a result of this protocol. The blacklisting is often wrong 
> (reducers on the smaller side of the network partition can collectively cause 
> nodes on the larger network partitioned to be blacklisted) and 
> counterproductive (rerunning maps puts further load on the (already) maxed 
> out network links).
> We should revisit how we can better identify nodes with genuine network 
> problems (and what role, if any, map-output fetch failures have in this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed

2010-05-19 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1505:
-

Status: Patch Available  (was: Open)

> Cluster class should create the rpc client only when needed
> ---
>
> Key: MAPREDUCE-1505
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.2
>Reporter: Devaraj Das
>Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: mapreduce-1505--2010-05-19.patch, 
> MAPREDUCE-1505_yhadoop20.patch, MAPREDUCE-1505_yhadoop20_9.patch
>
>
> It will be good to have the org.apache.hadoop.mapreduce.Cluster create the 
> rpc client object only when needed (when a call to the jobtracker is actually 
> required). org.apache.hadoop.mapreduce.Job constructs the Cluster object 
> internally and in many cases the application that created the Job object 
> really wants to look at the configuration only. It'd help to not have these 
> connections to the jobtracker especially when Job is used in the tasks (for 
> e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that 
> requires a Job object to be passed).
> In Hadoop 20, the Job object internally creates the JobClient object, and the 
> same argument applies there too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed

2010-05-19 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1505:
-

Attachment: mapreduce-1505--2010-05-19.patch

Delays making a connection to the job tracker node until it's needed.

Provides a new API so a user can tell whether this has been done, for a given 
job [although usually there would be no need to know].


> Cluster class should create the rpc client only when needed
> ---
>
> Key: MAPREDUCE-1505
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.2
>Reporter: Devaraj Das
>Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: mapreduce-1505--2010-05-19.patch, 
> MAPREDUCE-1505_yhadoop20.patch, MAPREDUCE-1505_yhadoop20_9.patch
>
>
> It will be good to have the org.apache.hadoop.mapreduce.Cluster create the 
> rpc client object only when needed (when a call to the jobtracker is actually 
> required). org.apache.hadoop.mapreduce.Job constructs the Cluster object 
> internally and in many cases the application that created the Job object 
> really wants to look at the configuration only. It'd help to not have these 
> connections to the jobtracker especially when Job is used in the tasks (for 
> e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that 
> requires a Job object to be passed).
> In Hadoop 20, the Job object internally creates the JobClient object, and the 
> same argument applies there too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic

2010-05-19 Thread Joydeep Sen Sarma (JIRA)
using map output fetch failures to blacklist nodes is problematic
-

 Key: MAPREDUCE-1800
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Joydeep Sen Sarma


If a mapper and a reducer cannot communicate, then either party could be at 
fault. The current hadoop protocol allows reducers to declare nodes running the 
mapper as being at fault. When sufficient number of reducers do so - then the 
map node can be blacklisted. 

In cases where networking problems cause substantial degradation in 
communication across sets of nodes - then large number of nodes can become 
blacklisted as a result of this protocol. The blacklisting is often wrong 
(reducers on the smaller side of the network partition can collectively cause 
nodes on the larger network partitioned to be blacklisted) and 
counterproductive (rerunning maps puts further load on the (already) maxed out 
network links).

We should revisit how we can better identify nodes with genuine network 
problems (and what role, if any, map-output fetch failures have in this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path

2010-05-19 Thread Dick King (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869206#action_12869206
 ] 

Dick King commented on MAPREDUCE-1744:
--

On the patch {{h1744.patch}} of 2010-05-15 04:36 PM , can we avoid broadening 
the exception signature of {{Job.add*ToClassPath(Path)}} by using 
{{FileSystem.get(conf)}} instead of {{cluster.getFileSystem()}} ?

-dk


> DistributedCache creates its own FileSytem instance when adding a 
> file/archive to the path
> --
>
> Key: MAPREDUCE-1744
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Dick King
> Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, 
> MAPREDUCE-1744.patch
>
>
> According to the contract of {{UserGroupInformation.doAs()}} the only 
> required operations within the {{doAs()}} block are the
> creation of a {{JobClient}} or getting a {{FileSystem}} .
> The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a 
> {{FileSystem}} instance outside of the {{doAs()}} block,
> this {{FileSystem}} instance is not in the scope of the proxy user but of the 
> superuser and permissions may make the method
> fail.
> One option is to overload the methods above to receive a filesystem.
> Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, 
> for this it would be required to have the proxy
> user set in the passed configuration.
> The second option seems nicer, but I don't know if the proxy user is as a 
> property in the jobconf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1635) ResourceEstimator does not work after MAPREDUCE-842

2010-05-19 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1635:
-

Release Note: Fixed a bug related to resource estimation for disk-based 
scheduling by modifying TaskTracker to return correct map output size for the 
completed maps and -1 for other tasks or failures.  (was: Fixed a bug in 
TaskTracker to return correct map output size for the completed maps and -1 for 
other tasks or failures.)

> ResourceEstimator does not work after MAPREDUCE-842
> ---
>
> Key: MAPREDUCE-1635
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1635
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: patch-1635-1.txt, patch-1635-ydist.txt, patch-1635.txt
>
>
> MAPREDUCE-842 changed Child's mapred.local.dir to have attemptDir as the base 
> local directory. Also assumption is that
> org.apache.hadoop.mapred.MapOutputFile always gets Child's mapred.local.dir. 
> But, MapOuptutFile.getOutputFile() is called from TaskTracker's conf, which 
> does not find the output file. Thus TaskTracker.tryToGetOutputSize() always 
> returns -1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-587) Stream test TestStreamingExitStatus fails with Out of Memory

2010-05-19 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-587:


Release Note: Fixed the streaming test TestStreamingExitStatus's failure 
due to an OutOfMemory error by reducing the testcase's io.sort.mb.  (was: Fixed 
the streaming test TestStreamingExitStatus's failure due ot Out of Memory by 
reducing the testcase's io.sort.mb.)

> Stream test TestStreamingExitStatus fails with Out of Memory
> 
>
> Key: MAPREDUCE-587
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-587
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
> Environment: OS/X, 64-bit x86 imac, 4GB RAM.
>Reporter: Steve Loughran
>Assignee: Amar Kamat
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-587-v1.0.patch, mr-587-yahoo-y20-v1.0.patch, 
> mr-587-yahoo-y20-v1.1.patch
>
>
> contrib/streaming tests are failing a test with an Out of Memory error on an 
> OS/X Mac -same problem does not surface on Linux.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-587) Stream test TestStreamingExitStatus fails with Out of Memory

2010-05-19 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-587:


Release Note: Fixed the streaming test TestStreamingExitStatus's failure 
due ot Out of Memory by reducing the testcase's io.sort.mb.  (was: Reduced the 
io.sort.mb in TestStreamingExitStatus to prevent OOM.)

> Stream test TestStreamingExitStatus fails with Out of Memory
> 
>
> Key: MAPREDUCE-587
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-587
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
> Environment: OS/X, 64-bit x86 imac, 4GB RAM.
>Reporter: Steve Loughran
>Assignee: Amar Kamat
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-587-v1.0.patch, mr-587-yahoo-y20-v1.0.patch, 
> mr-587-yahoo-y20-v1.1.patch
>
>
> contrib/streaming tests are failing a test with an Out of Memory error on an 
> OS/X Mac -same problem does not surface on Linux.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1607) Task controller may not set permissions for a task cleanup attempt's log directory

2010-05-19 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1607:
-

Release Note: Fixed initialization of a task-cleanup attempt's log 
directory by setting correct permissions via task-controller. Added new log4j 
properties hadoop.tasklog.iscleanup and log4j.appender.TLA.isCleanup to 
conf/log4j.properties. Changed the userlogs for a task-cleanup attempt to go 
into its own directory instead of the original attempt directory. This is an 
incompatible change as old userlogs of cleanup attempt-dirs before this release 
will no longer be visible.   (was: Fixed initialization of a task-cleanup 
attempt's log directory by setting correct permissions via task-controller. 
Changed the userlogs for a task-cleanup attempt to go into its own directory 
instead of the original attempt directory. This is an incompatible change as 
old userlogs of cleanup attempt-dirs will no longer be visible.)

> Task controller may not set permissions for a task cleanup attempt's log 
> directory
> --
>
> Key: MAPREDUCE-1607
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1607
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task-controller
>Affects Versions: 0.21.0
>Reporter: Hemanth Yamijala
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: patch-1607-1.txt, patch-1607-2.txt, 
> patch-1607-ydist.txt, patch-1607.txt
>
>
> Task controller uses the INITIALIZE_TASK command to initialize task attempt 
> and task log directories. For cleanup tasks, task attempt directories are 
> named as task-attempt-id.cleanup. But log directories do not have the 
> .cleanup suffix. The task controller is not aware of this distinction and 
> tries to set permissions for log directories named task-attempt-id.cleanup. 
> This is a NO-OP. Typically the task cleanup runs on the same node that ran 
> the original task attempt as well. So, the task log directories are already 
> properly initialized. However, the task cleanup can run on a node that has 
> not run the original task attempt. In that case, the initialization would not 
> happen and this could result in the cleanup task failing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1397) NullPointerException observed during task failures

2010-05-19 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1397:
-

Release Note: Fixed a race condition involving JvmRunner.kill() and 
KillTaskAction, which was leading to an NullPointerException causing a 
transient inconsistent state in JvmManager and failure of tasks.  (was: Fixed a 
NullPointerException observed in JvmManager during task failures that resulted 
in a transient inconsistent state.)

> NullPointerException observed during task failures
> --
>
> Key: MAPREDUCE-1397
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1397
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Ramya R
>Assignee: Amareshwari Sriramadasu
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: patch-1397-1.txt, patch-1397-2.txt, patch-1397-3.txt, 
> patch-1397-ydist.txt, patch-1397.txt
>
>
> In an environment where many jobs are killed simultaneously, NPEs are 
> observed in the TT/JT logs when a task fails. The situation is aggravated 
> when the taskcontroller.cfg is not configured properly. Below is the 
> exception obtained:
> {noformat}
> INFO org.apache.hadoop.mapred.TaskInProgress: Error from :
> java.lang.Throwable: Child Error
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:529)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:329)
> at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:315)
> at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:146)
> at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:109)
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:502)
>  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1657) After task logs directory is deleted, tasklog servlet displays wrong error message about job ACLs

2010-05-19 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1657:
-

 Release Note: Fixed a bug in tasklog servlet which displayed wrong 
error message about job ACLs - an access control error instead of the expected 
log files gone error - after task logs directory is deleted.  (was: Fixed a bug 
in tasklog servlet which displayed wrong error message about job ACLs - an 
access control error instead of log files gone error - after task logs 
directory is deleted.)
Affects Version/s: 0.21.0
   (was: 0.22.0)

> After task logs directory is deleted, tasklog servlet displays wrong error 
> message about job ACLs
> -
>
> Key: MAPREDUCE-1657
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1657
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: MR1657.20S.1.patch, MR1657.patch
>
>
> When task log gets deleted if from Web UI we click view task log, web page 
> displays wrong error message -:
> [
> HTTP ERROR: 401
> User user1 failed to view tasklogs of job job_201003241521_0001!
> user1 is not authorized for performing the operation VIEW_JOB on 
> job_201003241521_0001. VIEW_JOB Access control list
> configured for this job : 
> RequestURI=/tasklog
> ]
> Even if user is having view job acls set / or user is owner of job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path

2010-05-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869116#action_12869116
 ] 

Hadoop QA commented on MAPREDUCE-1744:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444582/h1744.patch
  against trunk revision 944427.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/console

This message is automatically generated.

> DistributedCache creates its own FileSytem instance when adding a 
> file/archive to the path
> --
>
> Key: MAPREDUCE-1744
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Dick King
> Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, 
> MAPREDUCE-1744.patch
>
>
> According to the contract of {{UserGroupInformation.doAs()}} the only 
> required operations within the {{doAs()}} block are the
> creation of a {{JobClient}} or getting a {{FileSystem}} .
> The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a 
> {{FileSystem}} instance outside of the {{doAs()}} block,
> this {{FileSystem}} instance is not in the scope of the proxy user but of the 
> superuser and permissions may make the method
> fail.
> One option is to overload the methods above to receive a filesystem.
> Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, 
> for this it would be required to have the proxy
> user set in the passed configuration.
> The second option seems nicer, but I don't know if the proxy user is as a 
> property in the jobconf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.

2010-05-19 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869112#action_12869112
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1794:
--

{quote}
 The JobStatus should be JobStatus.FAILED instead of succeeded. If the task 
tracker was lost
for all the four attempts of a task should'nt the job fail instead of succeed, 
if that is not the
case the message in the assert has to be changed the job suceeded even when 
loosing task tracker for 4 times. 
{quote}
[Vinay]: 
I think you misunderstood the functionality. If tasktracker was lost and it 
wait for timeout, later that task was marked as a killed and resubmitting into 
another task tracker. Even if it kills for four attempts due to lost 
tasktracker, it will resubmitting to another tasktracker for 5th time and keep 
continues until task succeed. For Killed tasks mapred.map.max.attempts 
attribute won't applicable,so it attempts the task 'N' no.of times. Max 
attempts is only applicable for failed tasks. In this case the job status 
should be succeed because of task might succeed at one point of time.

{quote}
 Why do we care for checking the job status for 40 % completion, also can be 
enhance the
building blocks to check this kind of status, since the code can be reused 
elsewhere. 
{quote}
[Vinay] : We just wanted to make sure, the job should start and completes 
atleast 40% because, atleast one  map or reduce tasks should run on the  
tasktracker for checking the conditions.

{quote}
The above code is repeated coupld of times can be part of a function, if this 
is used accross
test cases then can be part of building block. 
{quote}
[Vinay] : I will refactor the code by making the function.I don't thinks so it 
useful across the testcases.

{quote}
If you see the story description we said we will suspend the task tracker and 
resume it, but
it seems that you have followed the route of killing the task tracker instead 
of pausing and resuming it.
I think kiling should be fine since kill/start it emaulates the pause and 
resume, but on the
performance side if we had used pause and resume, so the waits in the test 
cases can be
reduced. 
{quote}
[Vinay] : I am pausing by stoping the tasktracker and resuming it by starting 
the tasktracker.So I don't think there would be a performance issue.

{quote}
One general question I have is after killing the same task tracker 4 times, the 
task tracker should
get blacklisted, and if you resubmit the job again, the task tracker should not 
be used by job tracker.
Is it good to check that condition as part of this test case or do you think 
this is out of scope.
There is url which has the blacklisted tasktracker, if we can get the number 
through aspect then
it can be verified. Also at the end of the test we need to remove the task 
tracker from blacklisted
condition for the other tests to run without any problem. 
{quote}
[Vinay] : for killed tasks, max attempts won't applicable like I said above. So 
there won't be any blacklisted.


> Test the job status of lost task trackers before and after the timeout.
> ---
>
> Key: MAPREDUCE-1794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1794_lost_tasktracker.patch
>
>
> This test covers the following scenarios.
> 1. Verify the job status whether it is succeeded or not when  the task 
> tracker is lost and alive before the timeout.
> 2. Verify the job status and killed attempts of a task whether it is 
> succeeded or not and killed attempts are matched or not  when the task 
> trackers are lost and it timeout for all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1731) Process tree clean up suspended task tests.

2010-05-19 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1731:
-

Attachment: 1731-ydist-security.patch

MAPREDUCE-1713 patch affects this patch because of dependency.So, uploading the 
new patch. 

> Process tree clean up suspended task tests.
> ---
>
> Key: MAPREDUCE-1731
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1731
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1731-ydist-security.patch, 1731-ydist-security.patch, 
> 1731-ydist-security.patch, suspendtask_1731.patch, suspendtask_1731.patch
>
>
> 1 .Verify the process tree cleanup of suspended task and task should be 
> terminated after timeout.
> 2. Verify the process tree cleanup of suspended task and resume the task 
> before task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.

2010-05-19 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1710:
-

Attachment: 1710-ydist_security.patch

MAPREDUCE-1713 patch affects this patch because of dependency.So, uploading the 
new patch. 

> Process tree clean up of exceeding memory limit tasks.
> --
>
> Key: MAPREDUCE-1710
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, 
> 1710-ydist_security.patch, memorylimittask_1710.patch, 
> memorylimittask_1710.patch, memorylimittask_1710.patch, 
> memorylimittask_1710.patch, memorylimittask_1710.patch
>
>
> 1. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Let the job complete . Check if all the 
> child processes are killed, the overall job should fail.
> 2. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Kill/fail the job while in progress. 
> Check if all the child processes are killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1693) Process tree clean up of either a failed task or killed task tests.

2010-05-19 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1693:
-

Attachment: 1693-ydist_security.patch

MAPREDUCE-1713 patch affects this patch because of dependency.So, uploading the 
new patch.

> Process tree clean up of either a failed task or killed task tests.
> ---
>
> Key: MAPREDUCE-1693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1693
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1693-ydist_security.patch, 1693-ydist_security.patch, 
> 1693-ydist_security.patch, taskchildskilling_1693.diff, 
> taskchildskilling_1693.diff, taskchildskilling_1693.patch, 
> taskchildskilling_1693.patch, taskchildskilling_1693.patch, 
> taskchildskilling_1693.patch, taskchildskilling_1693.patch, 
> taskchildskilling_1693.patch
>
>
> The following scenarios covered in the test.
> 1. Run a job which spawns subshells in the tasks. Kill one of the task. All 
> the child process of the killed task must be killed.
> 2. Run a job which spawns subshells in tasks. Fail one of the task. All the 
> child process of the killed task must be killed along with the task after its 
> failure.
> 3. Check process tree cleanup on paritcular task-tracker when we use 
> -kill-task and -fail-task with both map and reduce.
> 4. Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Let the job complete . Check if all the 
> child processes are killed, the overall job should fail.
> l)Submit a job which would spawn child processes and each of the child 
> processes exceeds the memory limits. Kill/fail the job while in progress. 
> Check if all the child processes are killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1799) TaskTracker webui fails to show logs for tasks whose child JVM itself crashes before process launch

2010-05-19 Thread Vinod K V (JIRA)
TaskTracker webui fails to show logs for tasks whose child JVM itself crashes 
before process launch
---

 Key: MAPREDUCE-1799
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1799
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task, tasktracker
Reporter: Vinod K V
 Fix For: 0.22.0


In many cases like invalid JVM arguments, JVM started with too much initialize 
heap or beyond OS ulimits, the child JVM itself crashes before the process can 
even be launched. In these situation, the tasktracker's webUI doesn't show the 
logs. This is because of a bug in the TaskLogServlet which displays logs only 
when syslog, stdout, stderr are all present. In the JVM crash case, syslog 
isn't created and so task-logs aren't displayed at all.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1713) Utilities for system tests specific.

2010-05-19 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1713:
-

Attachment: MAPREDUCE-1713.patch

New patch for specific to MapReduce.

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, MAPREDUCE-1713.patch, 
> systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path

2010-05-19 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1744:
-

Status: Open  (was: Patch Available)

> DistributedCache creates its own FileSytem instance when adding a 
> file/archive to the path
> --
>
> Key: MAPREDUCE-1744
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Dick King
> Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, 
> MAPREDUCE-1744.patch
>
>
> According to the contract of {{UserGroupInformation.doAs()}} the only 
> required operations within the {{doAs()}} block are the
> creation of a {{JobClient}} or getting a {{FileSystem}} .
> The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a 
> {{FileSystem}} instance outside of the {{doAs()}} block,
> this {{FileSystem}} instance is not in the scope of the proxy user but of the 
> superuser and permissions may make the method
> fail.
> One option is to overload the methods above to receive a filesystem.
> Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, 
> for this it would be required to have the proxy
> user set in the passed configuration.
> The second option seems nicer, but I don't know if the proxy user is as a 
> property in the jobconf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path

2010-05-19 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1744:
-

Status: Patch Available  (was: Open)

> DistributedCache creates its own FileSytem instance when adding a 
> file/archive to the path
> --
>
> Key: MAPREDUCE-1744
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Dick King
> Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, 
> MAPREDUCE-1744.patch
>
>
> According to the contract of {{UserGroupInformation.doAs()}} the only 
> required operations within the {{doAs()}} block are the
> creation of a {{JobClient}} or getting a {{FileSystem}} .
> The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a 
> {{FileSystem}} instance outside of the {{doAs()}} block,
> this {{FileSystem}} instance is not in the scope of the proxy user but of the 
> superuser and permissions may make the method
> fail.
> One option is to overload the methods above to receive a filesystem.
> Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, 
> for this it would be required to have the proxy
> user set in the passed configuration.
> The second option seems nicer, but I don't know if the proxy user is as a 
> property in the jobconf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.

2010-05-19 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869056#action_12869056
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1794:
---



+  /**
+   * Verify the job status whether it is succeeded or not when 
+   * the lost task trackers time out for all four attempts of a task. 
+   * @throws IOException if an I/O error occurs.
+   */
+  @Test
+  public void testJobStatusOfLostTracker2()  throws 
+  Exception {
+String testName = "LTT2";
+setupJobAndRun();
+JobStatus jStatus = verifyLostTaskTrackerJobStatus(testName);
+Assert.assertEquals("Job has not been failed...", 
+JobStatus.SUCCEEDED, jStatus.getRunState());
+  }


The JobStatus should be JobStatus.FAILED instead of succeeded. If the task 
tracker was lost
for all the four attempts of a task should'nt the job fail instead of succeed, 
if that is not the
case the message in the assert has to be changed the job suceeded even when 
loosing task tracker for 4 times. 


+// Make sure that job should run and completes 40%. 
+while (jobStatus.getRunState() != JobStatus.RUNNING && 
+  jobStatus.mapProgress() < 0.4f) {
+  UtilsForTests.waitFor(100);
+  jobStatus = wovenClient.getJobInfo(jID).getStatus();
+}

Why do we care for checking the job status for 40 % completion, also can be 
enhance the
building blocks to check this kind of status, since the code can be reused 
elsewhere. 


+TaskInfo[] taskInfos = wovenClient.getTaskInfo(jID);
+for (TaskInfo taskinfo : taskInfos) {
+  if (!taskinfo.isSetupOrCleanup()) {
+taskInfo = taskinfo;
+break;
+  }
+}

The above code can be part of a building block in JTClient. 


+   while (counter < 30) {
+ if (ttClient != null) {
+   break;
+ }else{
+taskInfo = wovenClient.getTaskInfo(taskInfo.getTaskID());  
+ttClient = getTTClientIns(taskInfo); 
+ }
+ counter ++;
+   }

The above code is repeated coupld of times can be part of a function, if this 
is used accross
test cases then can be part of building block. 

If you see the story description we said we will suspend the task tracker and 
resume it, but
it seems that you have followed the route of killing the task tracker instead 
of pausing and resuming it.
I think kiling should be fine since kill/start it emaulates the pause and 
resume, but on the
performance side if we had used pause and resume, so the waits in the test 
cases can be 
reduced.   

One general question I have is after killing the same task tracker 4 times, the 
task tracker should
get blacklisted, and if you resubmit the job again, the task tracker should not 
be used by job tracker. 
Is it good to check that condition as part of this test case or do you think 
this is out of scope. 
There is url which has the blacklisted tasktracker, if we can get the number 
through aspect then
it can be verified. Also at the end of the test we need to remove the task 
tracker from blacklisted
condition for the other tests to run without any problem. 

> Test the job status of lost task trackers before and after the timeout.
> ---
>
> Key: MAPREDUCE-1794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1794_lost_tasktracker.patch
>
>
> This test covers the following scenarios.
> 1. Verify the job status whether it is succeeded or not when  the task 
> tracker is lost and alive before the timeout.
> 2. Verify the job status and killed attempts of a task whether it is 
> succeeded or not and killed attempts are matched or not  when the task 
> trackers are lost and it timeout for all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null

2010-05-19 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869055#action_12869055
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-118:
---

The test TestMapredHeartbeat failed with IllegalMonitorException while shutting 
down DataNode. The failure is not related to the patch.
The same test passed on my machine.

> Job.getJobID() will always return null
> --
>
> Key: MAPREDUCE-118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.1
>Reporter: Amar Kamat
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.20.3
>
> Attachments: patch-118-0.20-1.txt, patch-118-0.20.txt, 
> patch-118-0.21.txt, patch-118-1.txt, patch-118-2.txt, patch-118-3.txt, 
> patch-118-4.txt, patch-118-5.txt, patch-118.txt
>
>
> JobContext is used for a read-only view of job's info. Hence all the readonly 
> fields in JobContext are set in the constructor. Job extends JobContext. When 
> a Job is created, jobid is not known and hence there is no way to set JobID 
> once Job is created. JobID is obtained only when the JobClient queries the 
> jobTracker for a job-id., which happens later i.e upon job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null

2010-05-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869054#action_12869054
 ] 

Hadoop QA commented on MAPREDUCE-118:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444776/patch-118-5.txt
  against trunk revision 944427.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/console

This message is automatically generated.

> Job.getJobID() will always return null
> --
>
> Key: MAPREDUCE-118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.1
>Reporter: Amar Kamat
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.20.3
>
> Attachments: patch-118-0.20-1.txt, patch-118-0.20.txt, 
> patch-118-0.21.txt, patch-118-1.txt, patch-118-2.txt, patch-118-3.txt, 
> patch-118-4.txt, patch-118-5.txt, patch-118.txt
>
>
> JobContext is used for a read-only view of job's info. Hence all the readonly 
> fields in JobContext are set in the constructor. Job extends JobContext. When 
> a Job is created, jobid is not known and hence there is no way to set JobID 
> once Job is created. JobID is obtained only when the JobClient queries the 
> jobTracker for a job-id., which happens later i.e upon job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.