[jira] Commented: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup
[ https://issues.apache.org/jira/browse/MAPREDUCE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869490#action_12869490 ] Joydeep Sen Sarma commented on MAPREDUCE-1802: -- thanks. 463 it is. > allow outputcommitters to skip setup/cleanup > > > Key: MAPREDUCE-1802 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > Job setup and cleanup overheads in our (larger) clusters are very significant > and add to latency for small jobs. It turns out that Hive does not require > job setup and cleanup at all - since all management of output/temporary files > and such is done by the hive client side. So it would be a big win for our > environment (and Hive users in general) if we could skip job cleanup/setup > altogether. > The proposal is to add new calls to OutputCommitter interface (along the > lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and > for the JT to take these into account while scheduling setup/cleanup. > NullOutputFormat should not need setup/cleanup for example. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup
[ https://issues.apache.org/jira/browse/MAPREDUCE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma resolved MAPREDUCE-1802. -- Resolution: Duplicate > allow outputcommitters to skip setup/cleanup > > > Key: MAPREDUCE-1802 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > Job setup and cleanup overheads in our (larger) clusters are very significant > and add to latency for small jobs. It turns out that Hive does not require > job setup and cleanup at all - since all management of output/temporary files > and such is done by the hive client side. So it would be a big win for our > environment (and Hive users in general) if we could skip job cleanup/setup > altogether. > The proposal is to add new calls to OutputCommitter interface (along the > lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and > for the JT to take these into account while scheduling setup/cleanup. > NullOutputFormat should not need setup/cleanup for example. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package
[ https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1623: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I just committed this. Thanks Tom, this was a big one! > Apply audience and stability annotations to classes in mapred package > - > > Key: MAPREDUCE-1623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: documentation >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: M1623-1.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch > > > There are lots of implementation classes in org.apache.hadoop.mapred which > makes it difficult to see the user-level MapReduce API classes in the > Javadoc. (See > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html > for example.) By marking these implementation classes with the > InterfaceAudience.Private annotation we can exclude them from user Javadoc > (using HADOOP-6658). > Later work will move the implementation classes into o.a.h.mapreduce.server > and related packages (see MAPREDUCE-561), but applying the annotations is a > good first step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1713) Utilities for system tests specific.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1713: - Attachment: MAPREDUCE-1713.patch Latest patch based on cos comments. > Utilities for system tests specific. > > > Key: MAPREDUCE-1713 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, > systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch > > > 1. A method for restarting the daemon with new configuration. > public static void restartCluster(Hashtable props, String > confFile) throws Exception; > 2. A method for resetting the daemon with default configuration. > public void resetCluster() throws Exception; > 3. A method for waiting until daemon to stop. > public void waitForClusterToStop() throws Exception; > 4. A method for waiting until daemon to start. > public void waitForClusterToStart() throws Exception; > 5. A method for checking the job whether it has started or not. > public boolean isJobStarted(JobID id) throws IOException; > 6. A method for checking the task whether it has started or not. > public boolean isTaskStarted(TaskInfo taskInfo) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package
[ https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869480#action_12869480 ] Hadoop QA commented on MAPREDUCE-1623: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445008/MAPREDUCE-1623.patch against trunk revision 944427. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/194/console This message is automatically generated. > Apply audience and stability annotations to classes in mapred package > - > > Key: MAPREDUCE-1623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: documentation >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: M1623-1.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch > > > There are lots of implementation classes in org.apache.hadoop.mapred which > makes it difficult to see the user-level MapReduce API classes in the > Javadoc. (See > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html > for example.) By marking these implementation classes with the > InterfaceAudience.Private annotation we can exclude them from user Javadoc > (using HADOOP-6658). > Later work will move the implementation classes into o.a.h.mapreduce.server > and related packages (see MAPREDUCE-561), but applying the annotations is a > good first step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package
[ https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869474#action_12869474 ] Tom White commented on MAPREDUCE-1623: -- +1 Thanks Arun! > Apply audience and stability annotations to classes in mapred package > - > > Key: MAPREDUCE-1623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: documentation >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: M1623-1.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch > > > There are lots of implementation classes in org.apache.hadoop.mapred which > makes it difficult to see the user-level MapReduce API classes in the > Javadoc. (See > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html > for example.) By marking these implementation classes with the > InterfaceAudience.Private annotation we can exclude them from user Javadoc > (using HADOOP-6658). > Later work will move the implementation classes into o.a.h.mapreduce.server > and related packages (see MAPREDUCE-561), but applying the annotations is a > good first step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1151) Cleanup and Setup jobs should only call cleanupJob() and setupJob() methods of the OutputCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-1151. Resolution: Duplicate Fixed by MAPREDUCE-1476 > Cleanup and Setup jobs should only call cleanupJob() and setupJob() methods > of the OutputCommitter > -- > > Key: MAPREDUCE-1151 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1151 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: Pradeep Kamath > > The cleanup and setup jobs run as map jobs and call setUpTask() , > needsTaskCommit() and possibly commitTask() and abortTask() methods of the > OutputCommitter. They should only be calling the cleanupJob() and setupJob() > methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup
[ https://issues.apache.org/jira/browse/MAPREDUCE-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869465#action_12869465 ] Amareshwari Sriramadasu commented on MAPREDUCE-1802: Is it same as MAPREDUCE-463? MAPREDUCE-463 adds a configuration "mapred.committer.job.setup.cleanup.needed" to know whether job needs a job-setup and job-cleanup. > allow outputcommitters to skip setup/cleanup > > > Key: MAPREDUCE-1802 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > > Job setup and cleanup overheads in our (larger) clusters are very significant > and add to latency for small jobs. It turns out that Hive does not require > job setup and cleanup at all - since all management of output/temporary files > and such is done by the hive client side. So it would be a big win for our > environment (and Hive users in general) if we could skip job cleanup/setup > altogether. > The proposal is to add new calls to OutputCommitter interface (along the > lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and > for the JT to take these into account while scheduling setup/cleanup. > NullOutputFormat should not need setup/cleanup for example. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1803) 0.21 nightly snapshot build has dependency on 0.22 snapshot
0.21 nightly snapshot build has dependency on 0.22 snapshot --- Key: MAPREDUCE-1803 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1803 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Aaron Kimball The POM generated in https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-mapred/0.21.0-SNAPSHOT/ has a reference to hadoop-core 0.22.0-SNAPSHOT -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1545) Add 'first-task-launched' to job-summary
[ https://issues.apache.org/jira/browse/MAPREDUCE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869442#action_12869442 ] Luke Lu commented on MAPREDUCE-1545: @ciemo, you can find start and finish times of *every* task in job history. The first task launch times are for the job *summary* only. > Add 'first-task-launched' to job-summary > > > Key: MAPREDUCE-1545 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1545 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Arun C Murthy >Assignee: Luke Lu > Fix For: 0.22.0 > > Attachments: mr-1545-trunk-v1.patch, mr-1545-trunk-v2.patch, > mr-1545-y20s-v1.patch, mr-1545-y20s-v2.patch, mr-1545-y20s-v3.patch > > > It would be useful to track 'first-task-launched' time to job-summary for > better reporting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1802) allow outputcommitters to skip setup/cleanup
allow outputcommitters to skip setup/cleanup Key: MAPREDUCE-1802 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1802 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Job setup and cleanup overheads in our (larger) clusters are very significant and add to latency for small jobs. It turns out that Hive does not require job setup and cleanup at all - since all management of output/temporary files and such is done by the hive client side. So it would be a big win for our environment (and Hive users in general) if we could skip job cleanup/setup altogether. The proposal is to add new calls to OutputCommitter interface (along the lines of needsTaskCommit()) to optionally allow skipping of setup/cleanup and for the JT to take these into account while scheduling setup/cleanup. NullOutputFormat should not need setup/cleanup for example. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package
[ https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1623: - Status: Patch Available (was: Open) > Apply audience and stability annotations to classes in mapred package > - > > Key: MAPREDUCE-1623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: documentation >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: M1623-1.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch > > > There are lots of implementation classes in org.apache.hadoop.mapred which > makes it difficult to see the user-level MapReduce API classes in the > Javadoc. (See > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html > for example.) By marking these implementation classes with the > InterfaceAudience.Private annotation we can exclude them from user Javadoc > (using HADOOP-6658). > Later work will move the implementation classes into o.a.h.mapreduce.server > and related packages (see MAPREDUCE-561), but applying the annotations is a > good first step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package
[ https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1623: - Attachment: MAPREDUCE-1623.patch Updated patch since the previous one didn't apply clean, I've incorporated my own (final) comments. Tom, if you are fine with the proposed changes I'll go ahead and commit. > Apply audience and stability annotations to classes in mapred package > - > > Key: MAPREDUCE-1623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: documentation >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: M1623-1.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch > > > There are lots of implementation classes in org.apache.hadoop.mapred which > makes it difficult to see the user-level MapReduce API classes in the > Javadoc. (See > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html > for example.) By marking these implementation classes with the > InterfaceAudience.Private annotation we can exclude them from user Javadoc > (using HADOOP-6658). > Later work will move the implementation classes into o.a.h.mapreduce.server > and related packages (see MAPREDUCE-561), but applying the annotations is a > good first step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package
[ https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1623: - Status: Open (was: Patch Available) Final comments: src/java/org/apache/hadoop/mapreduce/lib/jobcontrol/ControlledJob.java src/java/org/apache/hadoop/mapreduce/lib/jobcontrol/JobControl.java Both should be Public, Evolving - I don't think they are ready to be labelled 'stable' yet. src/java/org/apache/hadoop/mapreduce/QueueInfo.java -> Evolving src/java/org/apache/hadoop/mapred/IsolationRunner.java -> Evolving since I'm not sure IsolationRunner even works anymore. > Apply audience and stability annotations to classes in mapred package > - > > Key: MAPREDUCE-1623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: documentation >Reporter: Tom White >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: M1623-1.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, > MAPREDUCE-1623.patch, MAPREDUCE-1623.patch > > > There are lots of implementation classes in org.apache.hadoop.mapred which > makes it difficult to see the user-level MapReduce API classes in the > Javadoc. (See > http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html > for example.) By marking these implementation classes with the > InterfaceAudience.Private annotation we can exclude them from user Javadoc > (using HADOOP-6658). > Later work will move the implementation classes into o.a.h.mapreduce.server > and related packages (see MAPREDUCE-561), but applying the annotations is a > good first step. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1126) shuffle should use serialization to get comparator
[ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869425#action_12869425 ] Doug Cutting commented on MAPREDUCE-1126: - If we elect to abandon MAPREDUCE-815 in favor of AVRO-493, and since all of the underpinnings of this issue have been reverted, perhaps we should now close this as "won't fix"? > shuffle should use serialization to get comparator > -- > > Key: MAPREDUCE-1126 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Reporter: Doug Cutting >Assignee: Aaron Kimball > Fix For: 0.22.0 > > Attachments: m-1126-2.patch, m-1126-3.patch, MAPREDUCE-1126.2.patch, > MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, > MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch, MAPREDUCE-1126.patch > > > Currently the key comparator is defined as a Java class. Instead we should > use the Serialization API to create key comparators. This would permit, > e.g., Avro-based comparators to be used, permitting efficient sorting of > complex data types without having to write a RawComparator in Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Al Thompson updated MAPREDUCE-1641: --- Attachment: mapreduce-1641--2010-05-19.patch Minor edits made to the patch in an effort to improve readability. > Job submission should fail if same uri is added for mapred.cache.files and > mapred.cache.archives > > > Key: MAPREDUCE-1641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache >Reporter: Amareshwari Sriramadasu >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, > duped-files-archives--off-0-20-101--2010-04-21.patch, > duped-files-archives--off-0-20-101--2010-04-23--1819.patch, > mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, > patch-1641-ydist-bugfix.txt > > > The behavior of mapred.cache.files and mapred.cache.archives is different > during localization in the following way: > If a jar file is added to mapred.cache.files, it will be localized under > TaskTracker under a unique path. > If a jar file is added to mapred.cache.archives, it will be localized under a > unique path in a directory named the jar file name, and will be unarchived > under the same directory. > If same jar file is passed for both the configurations, the behavior > undefined. Thus the job submission should fail. > Currently, since distributed cache processes files before archives, the jar > file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1801) do not throw exception if cannot get a delegation token, it may be from a unsecured cluster (part of HDFS-1044)
do not throw exception if cannot get a delegation token, it may be from a unsecured cluster (part of HDFS-1044) --- Key: MAPREDUCE-1801 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1801 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1798) normalize property names for JT kerberos principal names in configuration (from HADOOP 6633)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869380#action_12869380 ] Jitendra Nath Pandey commented on MAPREDUCE-1798: - +1 > normalize property names for JT kerberos principal names in configuration > (from HADOOP 6633) > > > Key: MAPREDUCE-1798 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1798 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1798.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869373#action_12869373 ] Hadoop QA commented on MAPREDUCE-1505: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444965/mapreduce-1505--2010-05-19.patch against trunk revision 944427. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/193/console This message is automatically generated. > Cluster class should create the rpc client only when needed > --- > > Key: MAPREDUCE-1505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.20.2 >Reporter: Devaraj Das >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: mapreduce-1505--2010-05-19.patch, > MAPREDUCE-1505_yhadoop20.patch, MAPREDUCE-1505_yhadoop20_9.patch > > > It will be good to have the org.apache.hadoop.mapreduce.Cluster create the > rpc client object only when needed (when a call to the jobtracker is actually > required). org.apache.hadoop.mapreduce.Job constructs the Cluster object > internally and in many cases the application that created the Job object > really wants to look at the configuration only. It'd help to not have these > connections to the jobtracker especially when Job is used in the tasks (for > e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that > requires a Job object to be passed). > In Hadoop 20, the Job object internally creates the JobClient object, and the > same argument applies there too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1753) Implement a functionality for suspend and resume a task's process.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869363#action_12869363 ] Konstantin Boudnik commented on MAPREDUCE-1753: --- It isn't about my satisfaction. It's two bits return value which is done as boolean type. C language doesn't have it that's why they use integer instead. Now, about suspend resume process: you are right, it is generic. Which technically allow to suspend a daemon VM's process and never be able to resume it. But It seems to be Ok, I guess. I was totally confused by the fact that this has been tracked by a MAPREDUCE JIRA :( I'm moving this ticket out to HADOOP. > Implement a functionality for suspend and resume a task's process. > -- > > Key: MAPREDUCE-1753 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1753 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1753-ydist-security.patch, 1753-ydist-security.patch, > 1753-ydist-security.patch, daemonprotocolaspect.patch > > > Adding two methods in DaemonProtocolAspect.aj for suspend and resume the > process. > public int DaemonProtocol.resumeProcess(String pid) throws IOException; > public int DaemonProtocol.suspendProcess(String pid) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic
[ https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869360#action_12869360 ] Joydeep Sen Sarma commented on MAPREDUCE-1800: -- the problem is that the current heuristics also cause bad behavior when uplinks/core-switches degrade. i agree that the case of a single node that is not able to send map outputs is something that hadoop should detect/correct automatically - but i don't think the current heuristic (by itself) is a good one because of the previous point. i don't have a good alternative solution/proposals. a few thoughts pop to mind: - separate blacklisting of TTs due to map/reduce task failures from blacklisting due to map-output fetch failures. the thresholds and policies required seem different. - if the scope of the fault is nic/port/process/os problems affecting a 'single' node - then we should only take into map-fetch failures that happen within the same rack. (ie. assign blame to a TT only if other TTs within the same rack cannot communicate to it) - blame should be laid by a multitude of different hosts. It's no good if 4 reducers on TT1 cannot get map outputs from TT2 and this results in blacklisting of TT2. It's possible that TT1 itself has a bad port/nic. (just thinking aloud, i don't have a careful understanding of the code beyond what's been relayed to me by others :-)). > using map output fetch failures to blacklist nodes is problematic > - > > Key: MAPREDUCE-1800 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > If a mapper and a reducer cannot communicate, then either party could be at > fault. The current hadoop protocol allows reducers to declare nodes running > the mapper as being at fault. When sufficient number of reducers do so - then > the map node can be blacklisted. > In cases where networking problems cause substantial degradation in > communication across sets of nodes - then large number of nodes can become > blacklisted as a result of this protocol. The blacklisting is often wrong > (reducers on the smaller side of the network partition can collectively cause > nodes on the larger network partitioned to be blacklisted) and > counterproductive (rerunning maps puts further load on the (already) maxed > out network links). > We should revisit how we can better identify nodes with genuine network > problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic
[ https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869340#action_12869340 ] Arun C Murthy commented on MAPREDUCE-1800: -- FWIW the current heuristics protect reduces against a common case of a single node (on which the map ran), and works reasonably well. What I'm reading here is that we need better overall metrics/monitoring of the cluster and enhancements to the masters (JobTracker/NameNode) to take advantage of the metrics/monitoring stats. Is that reasonable? > using map output fetch failures to blacklist nodes is problematic > - > > Key: MAPREDUCE-1800 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > If a mapper and a reducer cannot communicate, then either party could be at > fault. The current hadoop protocol allows reducers to declare nodes running > the mapper as being at fault. When sufficient number of reducers do so - then > the map node can be blacklisted. > In cases where networking problems cause substantial degradation in > communication across sets of nodes - then large number of nodes can become > blacklisted as a result of this protocol. The blacklisting is often wrong > (reducers on the smaller side of the network partition can collectively cause > nodes on the larger network partitioned to be blacklisted) and > counterproductive (rerunning maps puts further load on the (already) maxed > out network links). > We should revisit how we can better identify nodes with genuine network > problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869333#action_12869333 ] Dmytro Molkov commented on MAPREDUCE-1354: -- Is there any particular reason that only getTaskCompletionEvents dropped the synchronized modifier, but all other job access methods like getCleanupTaskReports, getSetupTaskReports, etc are still syncrhonized, while effectively they are doing a very similar kind of access? > Incremental enhancements to the JobTracker for better scalability > - > > Key: MAPREDUCE-1354 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Reporter: Devaraj Das >Assignee: Dick King >Priority: Critical > Attachments: mapreduce-1354--2010-03-10.patch, > mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > mr-1354-y20.patch > > > It'd be nice to have the JobTracker object not be locked while accessing the > HDFS for reading the jobconf file and while writing the jobinfo file in the > submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1762) Add a setValue() method in Counter
[ https://issues.apache.org/jira/browse/MAPREDUCE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1762: -- Attachment: MAPREDUCE-1762.1.txt Fixed a typo in the patch. > Add a setValue() method in Counter > -- > > Key: MAPREDUCE-1762 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1762 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1762.1.txt, MAPREDUCE-1762.txt > > > Counters are very useful because of the logging and transmitting are already > there. > It is very convenient to transmit and store numbers. But currently Counter > only has an increment() method. > It will be nice if there can be a setValue() method in this class that will > allow us to transmit wider variety of information through it. > What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1762) Add a setValue() method in Counter
[ https://issues.apache.org/jira/browse/MAPREDUCE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869304#action_12869304 ] Dmytro Molkov commented on MAPREDUCE-1762: -- The code looks good > Add a setValue() method in Counter > -- > > Key: MAPREDUCE-1762 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1762 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1762.txt > > > Counters are very useful because of the logging and transmitting are already > there. > It is very convenient to transmit and store numbers. But currently Counter > only has an increment() method. > It will be nice if there can be a setValue() method in this class that will > allow us to transmit wider variety of information through it. > What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic
[ https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869266#action_12869266 ] Todd Lipcon commented on MAPREDUCE-1800: Hey Joydeep. Thanks for the further explanation - I agree we could do better here. There's an old JIRA where we threw around some ideas similar to this maybe last August or so, but can't seem to find it at the moment. Anyone remember the one I mean? > using map output fetch failures to blacklist nodes is problematic > - > > Key: MAPREDUCE-1800 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > If a mapper and a reducer cannot communicate, then either party could be at > fault. The current hadoop protocol allows reducers to declare nodes running > the mapper as being at fault. When sufficient number of reducers do so - then > the map node can be blacklisted. > In cases where networking problems cause substantial degradation in > communication across sets of nodes - then large number of nodes can become > blacklisted as a result of this protocol. The blacklisting is often wrong > (reducers on the smaller side of the network partition can collectively cause > nodes on the larger network partitioned to be blacklisted) and > counterproductive (rerunning maps puts further load on the (already) maxed > out network links). > We should revisit how we can better identify nodes with genuine network > problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869259#action_12869259 ] Dick King commented on MAPREDUCE-1354: -- The regression failure flagged by Hudson, {{TestJobStatusPersistency}} , does not repeat, and is hugely unlikely to have been caused by this patch. There is no new test because this patch fixes an extremely narrow race condition and that race cannot be induced artificially. > Incremental enhancements to the JobTracker for better scalability > - > > Key: MAPREDUCE-1354 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Reporter: Devaraj Das >Assignee: Dick King >Priority: Critical > Attachments: mapreduce-1354--2010-03-10.patch, > mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > mr-1354-y20.patch > > > It'd be nice to have the JobTracker object not be locked while accessing the > HDFS for reading the jobconf file and while writing the jobinfo file in the > submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic
[ https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869258#action_12869258 ] Joydeep Sen Sarma commented on MAPREDUCE-1800: -- if there is a total network partition - then we don't have a problem. either the cluster will fail outright (let's say JT and NN land up on different sides of the partition) - or one partition (the one that has the JT/NN) will exclude nodes from the other. (i say we don't have a problem in the sense that the response of hadoop to such an event is more or less correct). The problem is that we have had occurences of slow networks that are not quite partitioned. For example the uplink from one rack switch to the core switch can be flaky/degraded. in this case - control traffic from the JT to the TTs may be going through - but data traffic from mappers and reducers on the degraded racks can be really hurt. If there are problems in the core switch itself (it's underprovisioned) - then the whole cluster is having network problems. The description applies to such scenarios. In such a case - the appropriate response of the software should be, at worst, degraded performance (in keeping with the degraded nature of the underlying hardware) or at best, correctly identifying the the slow node(s) and not using them or using them less (this would apply to the flaky rack uplink scenario). The current response of Hadoop is neither. It makes a bad situation worse by misassigning blame (when map nodes on good racks are blamed by sufficiently large number of reducers running on bad racks). We potentially lose nodes from good racks and the resultant retry of tasks puts further stress on the strained network resource. A couple of things seem desirable: 1. for enterprise data center environments that (may) have high degree of control and monitoring around their networking elements - the ability to turn off (selectively) the functionality in hadoop that tries to detect and correct for network problems. Diagnostics stands a much better chance to catch/identify networking problems and fix them. 2. in environments with less control (say Amazon EC2 or hadoop running on a bunch of PCs across a company) that are more akin to a p2p network - hadoop's network fault diagnosis algorithms need improvement. A comparison to bittorrent is fair - over there every node advertises it's upload/download throughput and a node can come across as slow only in comparison to the collective stats published by all peers (and not just based on communication with a small set of other peers). > using map output fetch failures to blacklist nodes is problematic > - > > Key: MAPREDUCE-1800 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > If a mapper and a reducer cannot communicate, then either party could be at > fault. The current hadoop protocol allows reducers to declare nodes running > the mapper as being at fault. When sufficient number of reducers do so - then > the map node can be blacklisted. > In cases where networking problems cause substantial degradation in > communication across sets of nodes - then large number of nodes can become > blacklisted as a result of this protocol. The blacklisting is often wrong > (reducers on the smaller side of the network partition can collectively cause > nodes on the larger network partitioned to be blacklisted) and > counterproductive (rerunning maps puts further load on the (already) maxed > out network links). > We should revisit how we can better identify nodes with genuine network > problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic
[ https://issues.apache.org/jira/browse/MAPREDUCE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869247#action_12869247 ] Todd Lipcon commented on MAPREDUCE-1800: Hey Joydeep. Do you often have cases where sets of TT nodes can't talk to each other but both sides can still talk to the JT? This is interesting, as it seems like an unusual network architecture. > using map output fetch failures to blacklist nodes is problematic > - > > Key: MAPREDUCE-1800 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Joydeep Sen Sarma > > If a mapper and a reducer cannot communicate, then either party could be at > fault. The current hadoop protocol allows reducers to declare nodes running > the mapper as being at fault. When sufficient number of reducers do so - then > the map node can be blacklisted. > In cases where networking problems cause substantial degradation in > communication across sets of nodes - then large number of nodes can become > blacklisted as a result of this protocol. The blacklisting is often wrong > (reducers on the smaller side of the network partition can collectively cause > nodes on the larger network partitioned to be blacklisted) and > counterproductive (rerunning maps puts further load on the (already) maxed > out network links). > We should revisit how we can better identify nodes with genuine network > problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1505: - Status: Patch Available (was: Open) > Cluster class should create the rpc client only when needed > --- > > Key: MAPREDUCE-1505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.20.2 >Reporter: Devaraj Das >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: mapreduce-1505--2010-05-19.patch, > MAPREDUCE-1505_yhadoop20.patch, MAPREDUCE-1505_yhadoop20_9.patch > > > It will be good to have the org.apache.hadoop.mapreduce.Cluster create the > rpc client object only when needed (when a call to the jobtracker is actually > required). org.apache.hadoop.mapreduce.Job constructs the Cluster object > internally and in many cases the application that created the Job object > really wants to look at the configuration only. It'd help to not have these > connections to the jobtracker especially when Job is used in the tasks (for > e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that > requires a Job object to be passed). > In Hadoop 20, the Job object internally creates the JobClient object, and the > same argument applies there too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1505: - Attachment: mapreduce-1505--2010-05-19.patch Delays making a connection to the job tracker node until it's needed. Provides a new API so a user can tell whether this has been done, for a given job [although usually there would be no need to know]. > Cluster class should create the rpc client only when needed > --- > > Key: MAPREDUCE-1505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.20.2 >Reporter: Devaraj Das >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: mapreduce-1505--2010-05-19.patch, > MAPREDUCE-1505_yhadoop20.patch, MAPREDUCE-1505_yhadoop20_9.patch > > > It will be good to have the org.apache.hadoop.mapreduce.Cluster create the > rpc client object only when needed (when a call to the jobtracker is actually > required). org.apache.hadoop.mapreduce.Job constructs the Cluster object > internally and in many cases the application that created the Job object > really wants to look at the configuration only. It'd help to not have these > connections to the jobtracker especially when Job is used in the tasks (for > e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that > requires a Job object to be passed). > In Hadoop 20, the Job object internally creates the JobClient object, and the > same argument applies there too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1800) using map output fetch failures to blacklist nodes is problematic
using map output fetch failures to blacklist nodes is problematic - Key: MAPREDUCE-1800 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1800 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Joydeep Sen Sarma If a mapper and a reducer cannot communicate, then either party could be at fault. The current hadoop protocol allows reducers to declare nodes running the mapper as being at fault. When sufficient number of reducers do so - then the map node can be blacklisted. In cases where networking problems cause substantial degradation in communication across sets of nodes - then large number of nodes can become blacklisted as a result of this protocol. The blacklisting is often wrong (reducers on the smaller side of the network partition can collectively cause nodes on the larger network partitioned to be blacklisted) and counterproductive (rerunning maps puts further load on the (already) maxed out network links). We should revisit how we can better identify nodes with genuine network problems (and what role, if any, map-output fetch failures have in this). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869206#action_12869206 ] Dick King commented on MAPREDUCE-1744: -- On the patch {{h1744.patch}} of 2010-05-15 04:36 PM , can we avoid broadening the exception signature of {{Job.add*ToClassPath(Path)}} by using {{FileSystem.get(conf)}} instead of {{cluster.getFileSystem()}} ? -dk > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1635) ResourceEstimator does not work after MAPREDUCE-842
[ https://issues.apache.org/jira/browse/MAPREDUCE-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1635: - Release Note: Fixed a bug related to resource estimation for disk-based scheduling by modifying TaskTracker to return correct map output size for the completed maps and -1 for other tasks or failures. (was: Fixed a bug in TaskTracker to return correct map output size for the completed maps and -1 for other tasks or failures.) > ResourceEstimator does not work after MAPREDUCE-842 > --- > > Key: MAPREDUCE-1635 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1635 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-1635-1.txt, patch-1635-ydist.txt, patch-1635.txt > > > MAPREDUCE-842 changed Child's mapred.local.dir to have attemptDir as the base > local directory. Also assumption is that > org.apache.hadoop.mapred.MapOutputFile always gets Child's mapred.local.dir. > But, MapOuptutFile.getOutputFile() is called from TaskTracker's conf, which > does not find the output file. Thus TaskTracker.tryToGetOutputSize() always > returns -1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-587) Stream test TestStreamingExitStatus fails with Out of Memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-587: Release Note: Fixed the streaming test TestStreamingExitStatus's failure due to an OutOfMemory error by reducing the testcase's io.sort.mb. (was: Fixed the streaming test TestStreamingExitStatus's failure due ot Out of Memory by reducing the testcase's io.sort.mb.) > Stream test TestStreamingExitStatus fails with Out of Memory > > > Key: MAPREDUCE-587 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-587 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming > Environment: OS/X, 64-bit x86 imac, 4GB RAM. >Reporter: Steve Loughran >Assignee: Amar Kamat >Priority: Minor > Fix For: 0.21.0 > > Attachments: MAPREDUCE-587-v1.0.patch, mr-587-yahoo-y20-v1.0.patch, > mr-587-yahoo-y20-v1.1.patch > > > contrib/streaming tests are failing a test with an Out of Memory error on an > OS/X Mac -same problem does not surface on Linux. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-587) Stream test TestStreamingExitStatus fails with Out of Memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-587: Release Note: Fixed the streaming test TestStreamingExitStatus's failure due ot Out of Memory by reducing the testcase's io.sort.mb. (was: Reduced the io.sort.mb in TestStreamingExitStatus to prevent OOM.) > Stream test TestStreamingExitStatus fails with Out of Memory > > > Key: MAPREDUCE-587 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-587 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming > Environment: OS/X, 64-bit x86 imac, 4GB RAM. >Reporter: Steve Loughran >Assignee: Amar Kamat >Priority: Minor > Fix For: 0.21.0 > > Attachments: MAPREDUCE-587-v1.0.patch, mr-587-yahoo-y20-v1.0.patch, > mr-587-yahoo-y20-v1.1.patch > > > contrib/streaming tests are failing a test with an Out of Memory error on an > OS/X Mac -same problem does not surface on Linux. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1607) Task controller may not set permissions for a task cleanup attempt's log directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1607: - Release Note: Fixed initialization of a task-cleanup attempt's log directory by setting correct permissions via task-controller. Added new log4j properties hadoop.tasklog.iscleanup and log4j.appender.TLA.isCleanup to conf/log4j.properties. Changed the userlogs for a task-cleanup attempt to go into its own directory instead of the original attempt directory. This is an incompatible change as old userlogs of cleanup attempt-dirs before this release will no longer be visible. (was: Fixed initialization of a task-cleanup attempt's log directory by setting correct permissions via task-controller. Changed the userlogs for a task-cleanup attempt to go into its own directory instead of the original attempt directory. This is an incompatible change as old userlogs of cleanup attempt-dirs will no longer be visible.) > Task controller may not set permissions for a task cleanup attempt's log > directory > -- > > Key: MAPREDUCE-1607 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1607 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: Amareshwari Sriramadasu > Fix For: 0.21.0 > > Attachments: patch-1607-1.txt, patch-1607-2.txt, > patch-1607-ydist.txt, patch-1607.txt > > > Task controller uses the INITIALIZE_TASK command to initialize task attempt > and task log directories. For cleanup tasks, task attempt directories are > named as task-attempt-id.cleanup. But log directories do not have the > .cleanup suffix. The task controller is not aware of this distinction and > tries to set permissions for log directories named task-attempt-id.cleanup. > This is a NO-OP. Typically the task cleanup runs on the same node that ran > the original task attempt as well. So, the task log directories are already > properly initialized. However, the task cleanup can run on a node that has > not run the original task attempt. In that case, the initialization would not > happen and this could result in the cleanup task failing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1397) NullPointerException observed during task failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1397: - Release Note: Fixed a race condition involving JvmRunner.kill() and KillTaskAction, which was leading to an NullPointerException causing a transient inconsistent state in JvmManager and failure of tasks. (was: Fixed a NullPointerException observed in JvmManager during task failures that resulted in a transient inconsistent state.) > NullPointerException observed during task failures > -- > > Key: MAPREDUCE-1397 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1397 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.1 >Reporter: Ramya R >Assignee: Amareshwari Sriramadasu >Priority: Minor > Fix For: 0.21.0 > > Attachments: patch-1397-1.txt, patch-1397-2.txt, patch-1397-3.txt, > patch-1397-ydist.txt, patch-1397.txt > > > In an environment where many jobs are killed simultaneously, NPEs are > observed in the TT/JT logs when a task fails. The situation is aggravated > when the taskcontroller.cfg is not configured properly. Below is the > exception obtained: > {noformat} > INFO org.apache.hadoop.mapred.TaskInProgress: Error from : > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:529) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:329) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:315) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:146) > at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:109) > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:502) > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1657) After task logs directory is deleted, tasklog servlet displays wrong error message about job ACLs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1657: - Release Note: Fixed a bug in tasklog servlet which displayed wrong error message about job ACLs - an access control error instead of the expected log files gone error - after task logs directory is deleted. (was: Fixed a bug in tasklog servlet which displayed wrong error message about job ACLs - an access control error instead of log files gone error - after task logs directory is deleted.) Affects Version/s: 0.21.0 (was: 0.22.0) > After task logs directory is deleted, tasklog servlet displays wrong error > message about job ACLs > - > > Key: MAPREDUCE-1657 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1657 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Ravi Gummadi >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: MR1657.20S.1.patch, MR1657.patch > > > When task log gets deleted if from Web UI we click view task log, web page > displays wrong error message -: > [ > HTTP ERROR: 401 > User user1 failed to view tasklogs of job job_201003241521_0001! > user1 is not authorized for performing the operation VIEW_JOB on > job_201003241521_0001. VIEW_JOB Access control list > configured for this job : > RequestURI=/tasklog > ] > Even if user is having view job acls set / or user is owner of job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869116#action_12869116 ] Hadoop QA commented on MAPREDUCE-1744: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444582/h1744.patch against trunk revision 944427. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/192/console This message is automatically generated. > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869112#action_12869112 ] Vinay Kumar Thota commented on MAPREDUCE-1794: -- {quote} The JobStatus should be JobStatus.FAILED instead of succeeded. If the task tracker was lost for all the four attempts of a task should'nt the job fail instead of succeed, if that is not the case the message in the assert has to be changed the job suceeded even when loosing task tracker for 4 times. {quote} [Vinay]: I think you misunderstood the functionality. If tasktracker was lost and it wait for timeout, later that task was marked as a killed and resubmitting into another task tracker. Even if it kills for four attempts due to lost tasktracker, it will resubmitting to another tasktracker for 5th time and keep continues until task succeed. For Killed tasks mapred.map.max.attempts attribute won't applicable,so it attempts the task 'N' no.of times. Max attempts is only applicable for failed tasks. In this case the job status should be succeed because of task might succeed at one point of time. {quote} Why do we care for checking the job status for 40 % completion, also can be enhance the building blocks to check this kind of status, since the code can be reused elsewhere. {quote} [Vinay] : We just wanted to make sure, the job should start and completes atleast 40% because, atleast one map or reduce tasks should run on the tasktracker for checking the conditions. {quote} The above code is repeated coupld of times can be part of a function, if this is used accross test cases then can be part of building block. {quote} [Vinay] : I will refactor the code by making the function.I don't thinks so it useful across the testcases. {quote} If you see the story description we said we will suspend the task tracker and resume it, but it seems that you have followed the route of killing the task tracker instead of pausing and resuming it. I think kiling should be fine since kill/start it emaulates the pause and resume, but on the performance side if we had used pause and resume, so the waits in the test cases can be reduced. {quote} [Vinay] : I am pausing by stoping the tasktracker and resuming it by starting the tasktracker.So I don't think there would be a performance issue. {quote} One general question I have is after killing the same task tracker 4 times, the task tracker should get blacklisted, and if you resubmit the job again, the task tracker should not be used by job tracker. Is it good to check that condition as part of this test case or do you think this is out of scope. There is url which has the blacklisted tasktracker, if we can get the number through aspect then it can be verified. Also at the end of the test we need to remove the task tracker from blacklisted condition for the other tests to run without any problem. {quote} [Vinay] : for killed tasks, max attempts won't applicable like I said above. So there won't be any blacklisted. > Test the job status of lost task trackers before and after the timeout. > --- > > Key: MAPREDUCE-1794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1794_lost_tasktracker.patch > > > This test covers the following scenarios. > 1. Verify the job status whether it is succeeded or not when the task > tracker is lost and alive before the timeout. > 2. Verify the job status and killed attempts of a task whether it is > succeeded or not and killed attempts are matched or not when the task > trackers are lost and it timeout for all the four attempts of a task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1731) Process tree clean up suspended task tests.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1731: - Attachment: 1731-ydist-security.patch MAPREDUCE-1713 patch affects this patch because of dependency.So, uploading the new patch. > Process tree clean up suspended task tests. > --- > > Key: MAPREDUCE-1731 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1731 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1731-ydist-security.patch, 1731-ydist-security.patch, > 1731-ydist-security.patch, suspendtask_1731.patch, suspendtask_1731.patch > > > 1 .Verify the process tree cleanup of suspended task and task should be > terminated after timeout. > 2. Verify the process tree cleanup of suspended task and resume the task > before task timeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1710: - Attachment: 1710-ydist_security.patch MAPREDUCE-1713 patch affects this patch because of dependency.So, uploading the new patch. > Process tree clean up of exceeding memory limit tasks. > -- > > Key: MAPREDUCE-1710 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, > 1710-ydist_security.patch, memorylimittask_1710.patch, > memorylimittask_1710.patch, memorylimittask_1710.patch, > memorylimittask_1710.patch, memorylimittask_1710.patch > > > 1. Submit a job which would spawn child processes and each of the child > processes exceeds the memory limits. Let the job complete . Check if all the > child processes are killed, the overall job should fail. > 2. Submit a job which would spawn child processes and each of the child > processes exceeds the memory limits. Kill/fail the job while in progress. > Check if all the child processes are killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1693) Process tree clean up of either a failed task or killed task tests.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1693: - Attachment: 1693-ydist_security.patch MAPREDUCE-1713 patch affects this patch because of dependency.So, uploading the new patch. > Process tree clean up of either a failed task or killed task tests. > --- > > Key: MAPREDUCE-1693 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1693 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1693-ydist_security.patch, 1693-ydist_security.patch, > 1693-ydist_security.patch, taskchildskilling_1693.diff, > taskchildskilling_1693.diff, taskchildskilling_1693.patch, > taskchildskilling_1693.patch, taskchildskilling_1693.patch, > taskchildskilling_1693.patch, taskchildskilling_1693.patch, > taskchildskilling_1693.patch > > > The following scenarios covered in the test. > 1. Run a job which spawns subshells in the tasks. Kill one of the task. All > the child process of the killed task must be killed. > 2. Run a job which spawns subshells in tasks. Fail one of the task. All the > child process of the killed task must be killed along with the task after its > failure. > 3. Check process tree cleanup on paritcular task-tracker when we use > -kill-task and -fail-task with both map and reduce. > 4. Submit a job which would spawn child processes and each of the child > processes exceeds the memory limits. Let the job complete . Check if all the > child processes are killed, the overall job should fail. > l)Submit a job which would spawn child processes and each of the child > processes exceeds the memory limits. Kill/fail the job while in progress. > Check if all the child processes are killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1799) TaskTracker webui fails to show logs for tasks whose child JVM itself crashes before process launch
TaskTracker webui fails to show logs for tasks whose child JVM itself crashes before process launch --- Key: MAPREDUCE-1799 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1799 Project: Hadoop Map/Reduce Issue Type: Bug Components: task, tasktracker Reporter: Vinod K V Fix For: 0.22.0 In many cases like invalid JVM arguments, JVM started with too much initialize heap or beyond OS ulimits, the child JVM itself crashes before the process can even be launched. In these situation, the tasktracker's webUI doesn't show the logs. This is because of a bug in the TaskLogServlet which displays logs only when syslog, stdout, stderr are all present. In the JVM crash case, syslog isn't created and so task-logs aren't displayed at all. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1713) Utilities for system tests specific.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1713: - Attachment: MAPREDUCE-1713.patch New patch for specific to MapReduce. > Utilities for system tests specific. > > > Key: MAPREDUCE-1713 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, MAPREDUCE-1713.patch, > systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch > > > 1. A method for restarting the daemon with new configuration. > public static void restartCluster(Hashtable props, String > confFile) throws Exception; > 2. A method for resetting the daemon with default configuration. > public void resetCluster() throws Exception; > 3. A method for waiting until daemon to stop. > public void waitForClusterToStop() throws Exception; > 4. A method for waiting until daemon to start. > public void waitForClusterToStart() throws Exception; > 5. A method for checking the job whether it has started or not. > public boolean isJobStarted(JobID id) throws IOException; > 6. A method for checking the task whether it has started or not. > public boolean isTaskStarted(TaskInfo taskInfo) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1744: - Status: Open (was: Patch Available) > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1744: - Status: Patch Available (was: Open) > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869056#action_12869056 ] Balaji Rajagopalan commented on MAPREDUCE-1794: --- + /** + * Verify the job status whether it is succeeded or not when + * the lost task trackers time out for all four attempts of a task. + * @throws IOException if an I/O error occurs. + */ + @Test + public void testJobStatusOfLostTracker2() throws + Exception { +String testName = "LTT2"; +setupJobAndRun(); +JobStatus jStatus = verifyLostTaskTrackerJobStatus(testName); +Assert.assertEquals("Job has not been failed...", +JobStatus.SUCCEEDED, jStatus.getRunState()); + } The JobStatus should be JobStatus.FAILED instead of succeeded. If the task tracker was lost for all the four attempts of a task should'nt the job fail instead of succeed, if that is not the case the message in the assert has to be changed the job suceeded even when loosing task tracker for 4 times. +// Make sure that job should run and completes 40%. +while (jobStatus.getRunState() != JobStatus.RUNNING && + jobStatus.mapProgress() < 0.4f) { + UtilsForTests.waitFor(100); + jobStatus = wovenClient.getJobInfo(jID).getStatus(); +} Why do we care for checking the job status for 40 % completion, also can be enhance the building blocks to check this kind of status, since the code can be reused elsewhere. +TaskInfo[] taskInfos = wovenClient.getTaskInfo(jID); +for (TaskInfo taskinfo : taskInfos) { + if (!taskinfo.isSetupOrCleanup()) { +taskInfo = taskinfo; +break; + } +} The above code can be part of a building block in JTClient. + while (counter < 30) { + if (ttClient != null) { + break; + }else{ +taskInfo = wovenClient.getTaskInfo(taskInfo.getTaskID()); +ttClient = getTTClientIns(taskInfo); + } + counter ++; + } The above code is repeated coupld of times can be part of a function, if this is used accross test cases then can be part of building block. If you see the story description we said we will suspend the task tracker and resume it, but it seems that you have followed the route of killing the task tracker instead of pausing and resuming it. I think kiling should be fine since kill/start it emaulates the pause and resume, but on the performance side if we had used pause and resume, so the waits in the test cases can be reduced. One general question I have is after killing the same task tracker 4 times, the task tracker should get blacklisted, and if you resubmit the job again, the task tracker should not be used by job tracker. Is it good to check that condition as part of this test case or do you think this is out of scope. There is url which has the blacklisted tasktracker, if we can get the number through aspect then it can be verified. Also at the end of the test we need to remove the task tracker from blacklisted condition for the other tests to run without any problem. > Test the job status of lost task trackers before and after the timeout. > --- > > Key: MAPREDUCE-1794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1794_lost_tasktracker.patch > > > This test covers the following scenarios. > 1. Verify the job status whether it is succeeded or not when the task > tracker is lost and alive before the timeout. > 2. Verify the job status and killed attempts of a task whether it is > succeeded or not and killed attempts are matched or not when the task > trackers are lost and it timeout for all the four attempts of a task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null
[ https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869055#action_12869055 ] Amareshwari Sriramadasu commented on MAPREDUCE-118: --- The test TestMapredHeartbeat failed with IllegalMonitorException while shutting down DataNode. The failure is not related to the patch. The same test passed on my machine. > Job.getJobID() will always return null > -- > > Key: MAPREDUCE-118 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-118 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.20.1 >Reporter: Amar Kamat >Assignee: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.20.3 > > Attachments: patch-118-0.20-1.txt, patch-118-0.20.txt, > patch-118-0.21.txt, patch-118-1.txt, patch-118-2.txt, patch-118-3.txt, > patch-118-4.txt, patch-118-5.txt, patch-118.txt > > > JobContext is used for a read-only view of job's info. Hence all the readonly > fields in JobContext are set in the constructor. Job extends JobContext. When > a Job is created, jobid is not known and hence there is no way to set JobID > once Job is created. JobID is obtained only when the JobClient queries the > jobTracker for a job-id., which happens later i.e upon job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-118) Job.getJobID() will always return null
[ https://issues.apache.org/jira/browse/MAPREDUCE-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869054#action_12869054 ] Hadoop QA commented on MAPREDUCE-118: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444776/patch-118-5.txt against trunk revision 944427. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/539/console This message is automatically generated. > Job.getJobID() will always return null > -- > > Key: MAPREDUCE-118 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-118 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.20.1 >Reporter: Amar Kamat >Assignee: Amareshwari Sriramadasu >Priority: Blocker > Fix For: 0.20.3 > > Attachments: patch-118-0.20-1.txt, patch-118-0.20.txt, > patch-118-0.21.txt, patch-118-1.txt, patch-118-2.txt, patch-118-3.txt, > patch-118-4.txt, patch-118-5.txt, patch-118.txt > > > JobContext is used for a read-only view of job's info. Hence all the readonly > fields in JobContext are set in the constructor. Job extends JobContext. When > a Job is created, jobid is not known and hence there is no way to set JobID > once Job is created. JobID is obtained only when the JobClient queries the > jobTracker for a job-id., which happens later i.e upon job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.