[jira] Updated: (MAPREDUCE-848) TestCapacityScheduler is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-848: - Status: Patch Available (was: Open) TestCapacityScheduler is failing Key: MAPREDUCE-848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-848 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Affects Versions: 0.21.0 Reporter: Devaraj Das Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPREDUCE-848-v1.0.patch Looks like the commit of HADOOP-805 broke the CapacityScheduler testcase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-767) to remove mapreduce dependency on commons-cli2
[ https://issues.apache.org/jira/browse/MAPREDUCE-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742257#action_12742257 ] Amareshwari Sriramadasu commented on MAPREDUCE-767: --- Some comments: 1. Validators should not be removed. move them into a method and call them from streaming code. 2. Can you check passing -jobconf x=y x1=y1 throws an exception? Can you also verify if it is easy to split the values from streaming and add them? 3. Pipes options should also be tested to remove mapreduce dependency on commons-cli2 -- Key: MAPREDUCE-767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-767 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming Reporter: Giridharan Kesavan Assignee: Amar Kamat Attachments: MAPREDUCE-767-v1.1.patch mapreduce, streaming and eclipse plugin depends on common-cli2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-842) Per-job local data on the TaskTracker node should have right access-control
[ https://issues.apache.org/jira/browse/MAPREDUCE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742260#action_12742260 ] Vinod K V commented on MAPREDUCE-842: - I just ran the test cases that touch the changes in the last patch. I also generated docs and verified the documentation changes. The contrib test failures reported by Hudson are unrelated. Plz see [MAPREDUCE-848] and [MAPREDUCE-699]. This patch is committable. Per-job local data on the TaskTracker node should have right access-control --- Key: MAPREDUCE-842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-842 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Reporter: Arun C Murthy Assignee: Vinod K V Attachments: HADOOP-4491-20090623-common.1.txt, HADOOP-4491-20090623-mapred.1.txt, HADOOP-4491-20090703-common.1.txt, HADOOP-4491-20090703-common.txt, HADOOP-4491-20090703.1.txt, HADOOP-4491-20090703.txt, HADOOP-4491-20090707-common.txt, HADOOP-4491-20090707.txt, HADOOP-4491-20090716-mapred.txt, HADOOP-4491-20090803.1.txt, HADOOP-4491-20090803.txt, HADOOP-4491-20090806.txt, HADOOP-4491-20090807.2.txt, HADOOP-4491-20090810.1.txt, HADOOP-4491-20090810.3.txt, HADOOP-4491-20090811.txt, HADOOP-4491-20090812.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-849) Renaming of configuration property names in mapreduce
Renaming of configuration property names in mapreduce - Key: MAPREDUCE-849 URL: https://issues.apache.org/jira/browse/MAPREDUCE-849 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 In-line with HDFS-531, property names in configuration files should be standardized in MAPREDUCE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-430) Task stuck in cleanup with OutOfMemoryErrors
[ https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742266#action_12742266 ] Devaraj Das commented on MAPREDUCE-430: --- I propose that we catch Exception instead of Throwable in Child.java. Whatever is currently done inside the catch Throwable block is retained (just that the block will get executed if an Exception is caught). That's the only change we do in the patch. Task stuck in cleanup with OutOfMemoryErrors Key: MAPREDUCE-430 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amareshwari Sriramadasu Assignee: Amar Kamat Fix For: 0.20.1 Attachments: MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch Obesrved a task with OutOfMemory error, stuck in cleanup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-848) TestCapacityScheduler is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742283#action_12742283 ] Hadoop QA commented on MAPREDUCE-848: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416284/MAPREDUCE-848-v1.0.patch against trunk revision 803231. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/470/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/470/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/470/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/470/console This message is automatically generated. TestCapacityScheduler is failing Key: MAPREDUCE-848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-848 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Affects Versions: 0.21.0 Reporter: Devaraj Das Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPREDUCE-848-v1.0.patch Looks like the commit of HADOOP-805 broke the CapacityScheduler testcase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742345#action_12742345 ] Martin Dittus commented on MAPREDUCE-685: - We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets: http://jdbc.postgresql.org/documentation/83/query.html -- Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour). Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-814) Move completed Job history files to HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-814: - Release Note: Provides an ability to move completed job history files to a HDFS location via configuring mapred.job.tracker.history.completed.location. If the directory location does not already exist, it would be created by jobtracker. (was: Provides an ability to move completed job history files to a HDFS location via configuring mapred.job.tracker.history.completed.location.) Move completed Job history files to HDFS Key: MAPREDUCE-814 URL: https://issues.apache.org/jira/browse/MAPREDUCE-814 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Reporter: Sharad Agarwal Assignee: Sharad Agarwal Fix For: 0.21.0 Attachments: 814_v1.patch, 814_v2.patch, 814_v3.patch, 814_v4.patch, 814_v5.patch, 814_ydist.patch Currently completed job history files remain on the jobtracker node. Having the files available on HDFS will enable clients to access these files more easily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-850) PriorityScheduler should use TaskTrackerManager.killJob() instead of JobInProgress.kill()
PriorityScheduler should use TaskTrackerManager.killJob() instead of JobInProgress.kill() - Key: MAPREDUCE-850 URL: https://issues.apache.org/jira/browse/MAPREDUCE-850 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amar Kamat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-817) Add a cache for retired jobs with minimal job info and provide a way to access history file url
[ https://issues.apache.org/jira/browse/MAPREDUCE-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-817: - Attachment: 817_ydist_new.patch New patch for Yahoo's distribution. It does NOT introduce client side API changes. Add a cache for retired jobs with minimal job info and provide a way to access history file url --- Key: MAPREDUCE-817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-817 Project: Hadoop Map/Reduce Issue Type: New Feature Components: client, jobtracker Reporter: Sharad Agarwal Assignee: Sharad Agarwal Fix For: 0.21.0 Attachments: 817_v1.patch, 817_v2.patch, 817_v3.patch, 817_ydist.patch, 817_ydist_new.patch MAPREDUCE-814 will provide a way to keep the job history files in HDFS. There should be a way to get the url for the completed job history fie. The completed jobs can be purged from memory more aggressively from jobtracker since the clients can retrieve the information from history file. Jobtracker can just maintain the very basic info about the completed jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-842) Per-job local data on the TaskTracker node should have right access-control
[ https://issues.apache.org/jira/browse/MAPREDUCE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742366#action_12742366 ] Hadoop QA commented on MAPREDUCE-842: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416292/HADOOP-4491-20090812.txt against trunk revision 803231. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 50 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/console This message is automatically generated. Per-job local data on the TaskTracker node should have right access-control --- Key: MAPREDUCE-842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-842 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Reporter: Arun C Murthy Assignee: Vinod K V Attachments: HADOOP-4491-20090623-common.1.txt, HADOOP-4491-20090623-mapred.1.txt, HADOOP-4491-20090703-common.1.txt, HADOOP-4491-20090703-common.txt, HADOOP-4491-20090703.1.txt, HADOOP-4491-20090703.txt, HADOOP-4491-20090707-common.txt, HADOOP-4491-20090707.txt, HADOOP-4491-20090716-mapred.txt, HADOOP-4491-20090803.1.txt, HADOOP-4491-20090803.txt, HADOOP-4491-20090806.txt, HADOOP-4491-20090807.2.txt, HADOOP-4491-20090810.1.txt, HADOOP-4491-20090810.3.txt, HADOOP-4491-20090811.txt, HADOOP-4491-20090812.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-840) DBInputFormat leaves open transaction
[ https://issues.apache.org/jira/browse/MAPREDUCE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-840: Resolution: Fixed Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks Aaron! DBInputFormat leaves open transaction - Key: MAPREDUCE-840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-840 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Aaron Kimball Assignee: Aaron Kimball Priority: Minor Fix For: 0.21.0 Attachments: MAPREDUCE-840.patch DBInputFormat.getSplits() does not connection.commit() after the COUNT query. This can leave an open transaction against the database which interferes with other connections to the same table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-706) Support for FIFO pools in the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742414#action_12742414 ] Matei Zaharia commented on MAPREDUCE-706: - Contrib test failures are again unrelated. Support for FIFO pools in the fair scheduler Key: MAPREDUCE-706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-706 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Matei Zaharia Attachments: fsdesigndoc.pdf, fsdesigndoc.tex, mapreduce-706.patch, mapreduce-706.v1.patch, mapreduce-706.v2.patch, mapreduce-706.v3.patch, mapreduce-706.v4.patch The fair scheduler should support making the internal scheduling algorithm for some pools be FIFO instead of fair sharing in order to work better for batch workloads. FIFO pools will behave exactly like the current default scheduler, sorting jobs by priority and then submission time. Pools will have their scheduling algorithm set through the pools config file, and it will be changeable at runtime. To support this feature, I'm also changing the internal logic of the fair scheduler to no longer use deficits. Instead, for fair sharing, we will assign tasks to the job farthest below its share as a ratio of its share. This is easier to combine with other scheduling algorithms and leads to a more stable sharing situation, avoiding unfairness issues brought up in MAPREDUCE-543 and MAPREDUCE-544 that happen when some jobs have long tasks. The new preemption (MAPREDUCE-551) will ensure that critical jobs can gain their fair share within a bounded amount of time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-851) Static job propery accessors don't accept Configuration but various JobContext sub-classes
Static job propery accessors don't accept Configuration but various JobContext sub-classes -- Key: MAPREDUCE-851 URL: https://issues.apache.org/jira/browse/MAPREDUCE-851 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Reporter: Chris K Wensel The current method of accepting only JobContext or one of its sub-classes adds much complexity to dynamic job configuration builders that manipulate the Configuration object in order to dynamically configure Hadoop jobs, and influence internal Hadoop sub-systems during runtime to provide higher level functions and features. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-851) Static job propery accessors don't accept Configuration but various JobContext sub-classes
[ https://issues.apache.org/jira/browse/MAPREDUCE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris K Wensel updated MAPREDUCE-851: - Issue Type: Bug (was: Improvement) Static job propery accessors don't accept Configuration but various JobContext sub-classes -- Key: MAPREDUCE-851 URL: https://issues.apache.org/jira/browse/MAPREDUCE-851 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.20.1 Reporter: Chris K Wensel The current method of accepting only JobContext or one of its sub-classes adds much complexity to dynamic job configuration builders that manipulate the Configuration object in order to dynamically configure Hadoop jobs, and influence internal Hadoop sub-systems during runtime to provide higher level functions and features. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742426#action_12742426 ] Aaron Kimball commented on MAPREDUCE-685: - Because it's not actually the same fix ;) Postgresql wants you to do {{statement.setFetchSize(something_reasonable)}} e.g., 40. MySQL wants you to do {{statement.setFetchSize(INT_MIN)}}. The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN). That having been said, I have just finished a postgresql patch ready to post up here this week :) Just waiting for some existing patches to get committed first so that it applies cleanly. Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-842) Per-job local data on the TaskTracker node should have right access-control
[ https://issues.apache.org/jira/browse/MAPREDUCE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-842: --- Resolution: Fixed Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. Thanks, Vinod ! Per-job local data on the TaskTracker node should have right access-control --- Key: MAPREDUCE-842 URL: https://issues.apache.org/jira/browse/MAPREDUCE-842 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Reporter: Arun C Murthy Assignee: Vinod K V Fix For: 0.21.0 Attachments: HADOOP-4491-20090623-common.1.txt, HADOOP-4491-20090623-mapred.1.txt, HADOOP-4491-20090703-common.1.txt, HADOOP-4491-20090703-common.txt, HADOOP-4491-20090703.1.txt, HADOOP-4491-20090703.txt, HADOOP-4491-20090707-common.txt, HADOOP-4491-20090707.txt, HADOOP-4491-20090716-mapred.txt, HADOOP-4491-20090803.1.txt, HADOOP-4491-20090803.txt, HADOOP-4491-20090806.txt, HADOOP-4491-20090807.2.txt, HADOOP-4491-20090810.1.txt, HADOOP-4491-20090810.3.txt, HADOOP-4491-20090811.txt, HADOOP-4491-20090812.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split
[ https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742443#action_12742443 ] Hong Tang commented on MAPREDUCE-801: - Yet another solution, which I think is more general, is proposed and a jira MAPREDUCE-841 is created for track it. MAPREDUCE framework should issue warning with too many locations for a split Key: MAPREDUCE-801 URL: https://issues.apache.org/jira/browse/MAPREDUCE-801 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Customized input-format may be buggy and report misleading locations through input-split, an example of which is PIG-878. When an input split returns too many locations, it would not only artificially inflate the percentage of data local or rack local maps, but also force scheduler to use more memory and work harder to conduct task assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-706) Support for FIFO pools in the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742445#action_12742445 ] Tom White commented on MAPREDUCE-706: - +1 What testing did you carry out on this? Agree the documentation is excellent. Can you add it to the source tree? Support for FIFO pools in the fair scheduler Key: MAPREDUCE-706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-706 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: Matei Zaharia Assignee: Matei Zaharia Attachments: fsdesigndoc.pdf, fsdesigndoc.tex, mapreduce-706.patch, mapreduce-706.v1.patch, mapreduce-706.v2.patch, mapreduce-706.v3.patch, mapreduce-706.v4.patch The fair scheduler should support making the internal scheduling algorithm for some pools be FIFO instead of fair sharing in order to work better for batch workloads. FIFO pools will behave exactly like the current default scheduler, sorting jobs by priority and then submission time. Pools will have their scheduling algorithm set through the pools config file, and it will be changeable at runtime. To support this feature, I'm also changing the internal logic of the fair scheduler to no longer use deficits. Instead, for fair sharing, we will assign tasks to the job farthest below its share as a ratio of its share. This is easier to combine with other scheduling algorithms and leads to a more stable sharing situation, avoiding unfairness issues brought up in MAPREDUCE-543 and MAPREDUCE-544 that happen when some jobs have long tasks. The new preemption (MAPREDUCE-551) will ensure that critical jobs can gain their fair share within a bounded amount of time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-845) build.xml hard codes findbugs heap size, in some configurations 512M is insufficient to successfully build
[ https://issues.apache.org/jira/browse/MAPREDUCE-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742483#action_12742483 ] Hudson commented on MAPREDUCE-845: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) build.xml hard codes findbugs heap size, in some configurations 512M is insufficient to successfully build -- Key: MAPREDUCE-845 URL: https://issues.apache.org/jira/browse/MAPREDUCE-845 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.21.0 Environment: building on RHEL5 with both javadoc and findbugs in the same line Reporter: Lee Tucker Assignee: Lee Tucker Priority: Minor Fix For: 0.21.0 Attachments: MAPRED-845.patch Original Estimate: 0.03h Remaining Estimate: 0.03h When attempting the build with the hardcoded value of 512M for findbugs heap size, the build fails with: [findbugs] Java Result: -1 [xslt] Processing /grid/0/gs/gridre/SpringMapRedLevel2/build/test/findbugs/hadoop-findbugs-report.xml to /grid/0/gs/gridre/SpringMapRedLevel2/build/test/findbugs/hadoop-findbugs-report.html [xslt] Loading stylesheet /homes/hadoopqa/tools/findbugs/latest/src/xsl/default.xsl [xslt] : Error! Premature end of file. [xslt] : Error! com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Premature end of file. [xslt] Failed to process /grid/0/gs/gridre/SpringMapRedLevel2/build/test/findbugs/hadoop-findbugs-report.xml BUILD FAILED -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-805) Deadlock in Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742485#action_12742485 ] Hudson commented on MAPREDUCE-805: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) Deadlock in Jobtracker -- Key: MAPREDUCE-805 URL: https://issues.apache.org/jira/browse/MAPREDUCE-805 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Michael Tamm Fix For: 0.20.1 Attachments: MAPREDUCE-805-v1.1.patch, MAPREDUCE-805-v1.11-branch-0.20.patch, MAPREDUCE-805-v1.11.patch, MAPREDUCE-805-v1.12-branch-0.20.patch, MAPREDUCE-805-v1.12.patch, MAPREDUCE-805-v1.2.patch, MAPREDUCE-805-v1.3.patch, MAPREDUCE-805-v1.6.patch, MAPREDUCE-805-v1.7.patch We are running a hadoop cluster (version 0.20.0) and have detected the following deadlock on our jobtracker: {code} IPC Server handler 51 on 9001: at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943) - waiting to lock 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102) - locked 0x7f2b5f026000 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) pool-1-thread-2: at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017) - waiting to lock 0x7f2b5f026000 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483) - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152) - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169) - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245) - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
[ https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742482#action_12742482 ] Hudson commented on MAPREDUCE-372: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api. --- Key: MAPREDUCE-372 URL: https://issues.apache.org/jira/browse/MAPREDUCE-372 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: patch-372-1.txt, patch-372.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-840) DBInputFormat leaves open transaction
[ https://issues.apache.org/jira/browse/MAPREDUCE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742489#action_12742489 ] Hudson commented on MAPREDUCE-840: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) . DBInputFormat leaves open transaction. Contributed by Aaron Kimball. DBInputFormat leaves open transaction - Key: MAPREDUCE-840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-840 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Aaron Kimball Assignee: Aaron Kimball Priority: Minor Fix For: 0.21.0 Attachments: MAPREDUCE-840.patch DBInputFormat.getSplits() does not connection.commit() after the COUNT query. This can leave an open transaction against the database which interferes with other connections to the same table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-838) Task succeeds even when committer.commitTask fails with IOException
[ https://issues.apache.org/jira/browse/MAPREDUCE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742487#action_12742487 ] Hudson commented on MAPREDUCE-838: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) Task succeeds even when committer.commitTask fails with IOException --- Key: MAPREDUCE-838 URL: https://issues.apache.org/jira/browse/MAPREDUCE-838 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.1 Reporter: Koji Noguchi Assignee: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.20.1 Attachments: patch-838-0.20.txt, patch-838-1-0.20.txt, patch-838-1.txt, patch-838.txt In MAPREDUCE-837, job succeeded with empty output even though all the tasks were throwing IOException at commiter.commitTask. {noformat} 2009-08-07 17:51:47,458 INFO org.apache.hadoop.mapred.TaskRunner: Task attempt_200907301448_8771_r_00_0 is allowed to commit now 2009-08-07 17:51:47,466 WARN org.apache.hadoop.mapred.TaskRunner: Failure committing: java.io.IOException: Can not get the relative path: \ base = hdfs://mynamenode:8020/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0 \ child = hdfs://mynamenode/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0/_index at org.apache.hadoop.mapred.FileOutputCommitter.getFinalPath(FileOutputCommitter.java:150) at org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:106) at org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:126) at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:86) at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:171) at org.apache.hadoop.mapred.Task.commit(Task.java:768) at org.apache.hadoop.mapred.Task.done(Task.java:692) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2009-08-07 17:51:47,468 WARN org.apache.hadoop.mapred.TaskRunner: Failure asking whether task can commit: java.io.IOException: \ Can not get the relative path: base = hdfs://mynamenode:8020/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0 \ child = hdfs://mynamenode/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0/_index at org.apache.hadoop.mapred.FileOutputCommitter.getFinalPath(FileOutputCommitter.java:150) at org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:106) at org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:126) at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:86) at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:171) at org.apache.hadoop.mapred.Task.commit(Task.java:768) at org.apache.hadoop.mapred.Task.done(Task.java:692) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2009-08-07 17:51:47,469 INFO org.apache.hadoop.mapred.TaskRunner: Task attempt_200907301448_8771_r_00_0 is allowed to commit now 2009-08-07 17:51:47,472 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_200907301448_8771_r_00_0' done. {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-848) TestCapacityScheduler is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742491#action_12742491 ] Hudson commented on MAPREDUCE-848: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) . Fixes a problem to do with TestCapacityScheduler failing. Contributed by Amar Kamat. TestCapacityScheduler is failing Key: MAPREDUCE-848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-848 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Affects Versions: 0.21.0 Reporter: Devaraj Das Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPREDUCE-848-v1.0.patch Looks like the commit of HADOOP-805 broke the CapacityScheduler testcase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-789) Oracle support for Sqoop
[ https://issues.apache.org/jira/browse/MAPREDUCE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742493#action_12742493 ] Hudson commented on MAPREDUCE-789: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) . Oracle support for Sqoop. Contributed by Aaron Kimball. Oracle support for Sqoop Key: MAPREDUCE-789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-789 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-789.patch A separate ConnManager is needed for Oracle to support its slightly different syntax and configuration -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-779) Add node health failures into JobTrackerStatistics
[ https://issues.apache.org/jira/browse/MAPREDUCE-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742492#action_12742492 ] Hudson commented on MAPREDUCE-779: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) Add node health failures into JobTrackerStatistics -- Key: MAPREDUCE-779 URL: https://issues.apache.org/jira/browse/MAPREDUCE-779 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Sreekanth Ramakrishnan Assignee: Sreekanth Ramakrishnan Fix For: 0.21.0 Attachments: mapreduce-779-1.patch, mapreduce-779-2.patch, mapreduce-779-3.patch, mapreduce-779-4.patch Add the node health failure counts into {{JobTrackerStatistics}}. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-808) Buffer objects incorrectly serialized to typed bytes
[ https://issues.apache.org/jira/browse/MAPREDUCE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742488#action_12742488 ] Hudson commented on MAPREDUCE-808: -- Integrated in Hadoop-Mapreduce-trunk #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/]) Buffer objects incorrectly serialized to typed bytes Key: MAPREDUCE-808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-808 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: Klaas Bosteels Assignee: Klaas Bosteels Fix For: 0.21.0 Attachments: MAPREDUCE-808.patch {{TypedBytesOutput.write()}} should do something like {code} Buffer buf = (Buffer) obj; writeBytes(buf.get(), 0, bug.getCount()); {code} instead of {code} writeBytes(((Buffer) obj).get()); {code} since the bytes returned by {{Buffer.get()}} are only valid between 0 and getCount() - 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-852) ExampleDriver is incorrectly set as a Main-Class in tools in build.xml
ExampleDriver is incorrectly set as a Main-Class in tools in build.xml -- Key: MAPREDUCE-852 URL: https://issues.apache.org/jira/browse/MAPREDUCE-852 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Tsz Wo (Nicholas), SZE In build.xml, {code} target name=examples depends=jar, compile-examples description=Make the Hadoop examples jar. ... target name=tools-jar depends=jar, compile-tools description=Make the Hadoop tools jar. jar jarfile=${build.dir}/${tools.final.name}.jar basedir=${build.tools} manifest attribute name=Main-Class value=org/apache/hadoop/examples/ExampleDriver/ /manifest /jar /target {code} - ExampleDriver should not be a Main-Class of tools - Should we rename the target name from tools-jar to tools, so that the name would be consistent with the examples target? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-852) ExampleDriver is incorrectly set as a Main-Class in tools in build.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-852: - Attachment: m852_20090812.patch m852_20090812.patch: renamed tools-jar to tools and removed ExampleDriver from tools. ExampleDriver is incorrectly set as a Main-Class in tools in build.xml -- Key: MAPREDUCE-852 URL: https://issues.apache.org/jira/browse/MAPREDUCE-852 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Tsz Wo (Nicholas), SZE Attachments: m852_20090812.patch In build.xml, {code} target name=examples depends=jar, compile-examples description=Make the Hadoop examples jar. ... target name=tools-jar depends=jar, compile-tools description=Make the Hadoop tools jar. jar jarfile=${build.dir}/${tools.final.name}.jar basedir=${build.tools} manifest attribute name=Main-Class value=org/apache/hadoop/examples/ExampleDriver/ /manifest /jar /target {code} - ExampleDriver should not be a Main-Class of tools - Should we rename the target name from tools-jar to tools, so that the name would be consistent with the examples target? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-825) JobClient completion poll interval of 5s causes slow tests in local mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742639#action_12742639 ] Todd Lipcon commented on MAPREDUCE-825: --- Patch looks good to me. +1 JobClient completion poll interval of 5s causes slow tests in local mode Key: MAPREDUCE-825 URL: https://issues.apache.org/jira/browse/MAPREDUCE-825 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Aaron Kimball Assignee: Aaron Kimball Priority: Minor Attachments: completion-poll-interval.patch, MAPREDUCE-825.2.patch The JobClient.NetworkedJob.waitForCompletion() method polls for job completion every 5 seconds. When running a set of short tests in pseudo-distributed mode, this is unnecessarily slow and causes lots of wasted time. When bandwidth is not scarce, setting the poll interval to 100 ms results in a 4x speedup in some tests. This interval should be parametrized to allow users to control the interval for testing purposes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-478) separate jvm param for mapper and reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742681#action_12742681 ] Sreekanth Ramakrishnan commented on MAPREDUCE-478: -- Also on second thought, in my opinion [HADOOP-6105|http://issues.apache.org/jira/browse/HADOOP-6105] actually helps the issue which is mentioned here, it provides you automatic facility to split old key into two new keys. separate jvm param for mapper and reducer - Key: MAPREDUCE-478 URL: https://issues.apache.org/jira/browse/MAPREDUCE-478 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Koji Noguchi Assignee: Arun C Murthy Priority: Minor Fix For: 0.21.0 Attachments: HADOOP-5684_0_20090420.patch, MAPREDUCE-478_0_20090804.patch, MAPREDUCE-478_0_20090804_yhadoop20.patch, MAPREDUCE-478_1_20090806.patch, MAPREDUCE-478_1_20090806_yhadoop20.patch Memory footprint of mapper and reducer can differ. It would be nice if we can pass different jvm param (mapred.child.java.opts) for mappers and reducers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742712#action_12742712 ] Owen O'Malley commented on MAPREDUCE-157: - I'm confused what the goal of using Avro here would be. Let's review the goals: 1. Get an easily parseable text format. 2. Not require excessive amounts of time for logging 2a. Not require excessive object allocations. It seems like to use Avro, we'd need to create the Avro objects and then write them out. I'd rather just use a JsonWriter to write the events out to the stream. Of course reading is the reverse. I would be like writing xml files by generating the necessary DOM objects. You can do it (and in fact Configuration is written that way. *sigh*), but it costs a lot of time. Not having seen the Avro text format, I can't evaluation how much overhead it adds. None of the features of Avro seem compelling in this case, and could easily lead to unfortunate choices. Furthermore, I don't know if there are any guarantees about the Avro text format's stability. We need stability in this format. Job History log file format is not friendly for external tools. --- Key: MAPREDUCE-157 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Jothi Padmanabhan Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742714#action_12742714 ] Philip Zeyliger commented on MAPREDUCE-157: --- Avro would force you in to a schema, and I think having a schema is the only way to get stability in the format. Yes, there's probably overhead, but if we're using Avro for other things (i.e., all RPCs), we may as well fix those overheads when we get to them. (It may also be a net win to store the data in binary avro format, and write an avrocat to deserialize into text before pushing to tools like awk, but I do understand the desire for a text format.) All that said, you have specific needs in mind here, and I'm mostly waxing poetical, so I'll certainly defer. -- Philip Job History log file format is not friendly for external tools. --- Key: MAPREDUCE-157 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Jothi Padmanabhan Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-854) JobInProgress.initTasks() should not throw KillInterruptedException
JobInProgress.initTasks() should not throw KillInterruptedException Key: MAPREDUCE-854 URL: https://issues.apache.org/jira/browse/MAPREDUCE-854 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Amar Kamat Assignee: Amar Kamat JobInProgress.initTasks() throws KillInterruptedException if its killed in init. This is a bad programming practice. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-855) Testcases faking TaskTrackerManager might result into NPE
Testcases faking TaskTrackerManager might result into NPE -- Key: MAPREDUCE-855 URL: https://issues.apache.org/jira/browse/MAPREDUCE-855 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Amar Kamat Assignee: Amar Kamat JobInProgress uses JobTracker.getClock() assuming that the JobTracker is initialized before creating JobInProgress. This might not be true as the testcase might fake TaskTrackerManager. In such cases the JobInProgress might result into NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-853) Support a hierarchy of queues in the Map/Reduce framework
Support a hierarchy of queues in the Map/Reduce framework - Key: MAPREDUCE-853 URL: https://issues.apache.org/jira/browse/MAPREDUCE-853 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Reporter: Hemanth Yamijala Fix For: 0.21.0 In MAPREDUCE-824, we proposed introducing a hierarchy of queues in the capacity scheduler. Currently, the M/R framework provides the notion of job queues and handles some functionality related to queues in a scheduler-agnostic manner. This functionality includes: - Managing the list of ACLs for queues - Managing the run state of queues - running or stopped - Displaying scheduling information about queues in the jobtracker web UI and job client CLI - Displaying list of jobs in a queue in the jobtracker web UI and job client CLI - Providing APIs for list queues and queue information in JobClient. Since it would be beneficial to extend this functionality to hierarchical queues, this JIRA is proposing introducing the concept into the map/reduce framework as well. We could treat this as an umbrella JIRA and file additional tasks for each of the changes involved, sticking to the high level approach in this JIRA. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-828) Provide a mechanism to pause the jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742726#action_12742726 ] eric baldeschwieler commented on MAPREDUCE-828: --- Jobs submitted to the JT will be queued up. However, if the job client fails to write the job files to the DFS (the step before job submission), those jobs will be lost. Presumably the client can detect this and fail? What does the client do if it tries to submit a job to a paused JT? Why would the JT not process heartbeats normally (without scheduling new tasks)? the new state seems hackish. Provide a mechanism to pause the jobtracker --- Key: MAPREDUCE-828 URL: https://issues.apache.org/jira/browse/MAPREDUCE-828 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Reporter: Hemanth Yamijala We've seen scenarios when we have needed to stop the namenode for a maintenance activity. In such scenarios, if the jobtracker (JT) continues to run, jobs would fail due to initialization or task failures (due to DFS). We could restart the JT enabling job recovery, during such scenarios. But restart has proved to be a very intrusive activity, particularly if the JT is not at fault itself and does not require a restart. The ask is for a admin-controlled feature to pause the JT which would take it to a state somewhat analogous to the safe mode of DFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.