[jira] Updated: (MAPREDUCE-943) TestNodeRefresh timesout occasionally
[ https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-943: -- Issue Type: Sub-task (was: Bug) Parent: MAPREDUCE-873 TestNodeRefresh timesout occasionally - Key: MAPREDUCE-943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker Reporter: Amareshwari Sriramadasu Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPRED-943-v1.0.patch TestNodeRefresh timesout occasionally. One of the hudson patch build with timeout @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-943) TestNodeRefresh timesout occasionally
[ https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das resolved MAPREDUCE-943. --- Resolution: Fixed I just committed this. Thanks, Amar! TestNodeRefresh timesout occasionally - Key: MAPREDUCE-943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker Reporter: Amareshwari Sriramadasu Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPRED-943-v1.0.patch TestNodeRefresh timesout occasionally. One of the hudson patch build with timeout @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-957) Set mapred.job.name for a pipes job
Set mapred.job.name for a pipes job --- Key: MAPREDUCE-957 URL: https://issues.apache.org/jira/browse/MAPREDUCE-957 Project: Hadoop Map/Reduce Issue Type: Wish Components: pipes Affects Versions: 0.20.1 Reporter: Ramya R Priority: Minor Currently mapred.job.name is not set for a pipes job. It will be useful if this value is set when a pipes job is submitted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-943) TestNodeRefresh timesout occasionally
[ https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752019#action_12752019 ] Devaraj Das commented on MAPREDUCE-943: --- Should have added that I also agree that the testcase which times out is no longer needed. TestNodeRefresh timesout occasionally - Key: MAPREDUCE-943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker Reporter: Amareshwari Sriramadasu Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPRED-943-v1.0.patch TestNodeRefresh timesout occasionally. One of the hudson patch build with timeout @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh updated MAPREDUCE-861: Attachment: MAPREDUCE-861-4.patch Incorporated all the comments except 1.In DeprecatedHierarchyBuilder we are still not checking if ACLs are disabled before parsing them. Note though that this is being done for the QueueHierarchyBuilder. Lots of testcases esp. in TestQueueManager are written with an assumption that MapString, AccessControlList list is created for the Queue object all the time. esp in case of setting mapred.acls.enabled = true using conf.set . There are lots of NullPointerException if we dont generate this empty object. Hence not accommodating this comment , as it is a significant change in testcase and moreover for deprecated stuff and having this does empty MapString,AccessControlList doesn't effect the overall behaviour at all. Modify queue configuration format and parsing to support a hierarchy of queues. --- Key: MAPREDUCE-861 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Hemanth Yamijala Assignee: rahul k singh Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the configuration related to queues. The current format for defining a queue and its properties is as follows: mapred.queue.queue-name.property-name. For e.g. mapred.queue.queue-name.acl-submit-job. The reason for using this verbose format was to be able to reuse the Configuration parser in Hadoop. However, administrators currently using the queue configuration have already indicated a very strong desire for a more manageable format. Since, this becomes more unwieldy with hierarchical queues, the time may be good to introduce a new format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-943) TestNodeRefresh timesout occasionally
[ https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752020#action_12752020 ] Hudson commented on MAPREDUCE-943: -- Integrated in Hadoop-Mapreduce-trunk-Commit #18 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/18/]) . Removes a testcase in TestNodeRefresh that doesn't make sense in the new Job recovery model. Contributed by Amar Kamat. TestNodeRefresh timesout occasionally - Key: MAPREDUCE-943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker Reporter: Amareshwari Sriramadasu Assignee: Amar Kamat Fix For: 0.21.0 Attachments: MAPRED-943-v1.0.patch TestNodeRefresh timesout occasionally. One of the hudson patch build with timeout @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752021#action_12752021 ] rahul k singh commented on MAPREDUCE-861: - error in the above patch , attaching new one. Modify queue configuration format and parsing to support a hierarchy of queues. --- Key: MAPREDUCE-861 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Hemanth Yamijala Assignee: rahul k singh Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the configuration related to queues. The current format for defining a queue and its properties is as follows: mapred.queue.queue-name.property-name. For e.g. mapred.queue.queue-name.acl-submit-job. The reason for using this verbose format was to be able to reuse the Configuration parser in Hadoop. However, administrators currently using the queue configuration have already indicated a very strong desire for a more manageable format. Since, this becomes more unwieldy with hierarchical queues, the time may be good to introduce a new format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh updated MAPREDUCE-861: Attachment: MAPREDUCE-861-5.patch Modify queue configuration format and parsing to support a hierarchy of queues. --- Key: MAPREDUCE-861 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Hemanth Yamijala Assignee: rahul k singh Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the configuration related to queues. The current format for defining a queue and its properties is as follows: mapred.queue.queue-name.property-name. For e.g. mapred.queue.queue-name.acl-submit-job. The reason for using this verbose format was to be able to reuse the Configuration parser in Hadoop. However, administrators currently using the queue configuration have already indicated a very strong desire for a more manageable format. Since, this becomes more unwieldy with hierarchical queues, the time may be good to introduce a new format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh updated MAPREDUCE-861: Status: Patch Available (was: Open) Modify queue configuration format and parsing to support a hierarchy of queues. --- Key: MAPREDUCE-861 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Hemanth Yamijala Assignee: rahul k singh Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the configuration related to queues. The current format for defining a queue and its properties is as follows: mapred.queue.queue-name.property-name. For e.g. mapred.queue.queue-name.acl-submit-job. The reason for using this verbose format was to be able to reuse the Configuration parser in Hadoop. However, administrators currently using the queue configuration have already indicated a very strong desire for a more manageable format. Since, this becomes more unwieldy with hierarchical queues, the time may be good to introduce a new format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control
[ https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752023#action_12752023 ] Hemanth Yamijala commented on MAPREDUCE-856: I verified the changes. Only comment is that in the changes related to synchronization of user localization, we are repeating work related to a user everytime job localization happens. A suggestion is to keep the synchronization on user name, but have the value to be a state variable that can indicate the status of localization and check that before beginning to localize. Localized files from DistributedCache should have right access-control -- Key: MAPREDUCE-856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Reporter: Arun C Murthy Assignee: Vinod K V Fix For: 0.21.0 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, MAPREDUCE-856-20090904.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-860) Modify Queue APIs to support a hierarchy of queues
[ https://issues.apache.org/jira/browse/MAPREDUCE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh resolved MAPREDUCE-860. - Resolution: Duplicate Modify Queue APIs to support a hierarchy of queues -- Key: MAPREDUCE-860 URL: https://issues.apache.org/jira/browse/MAPREDUCE-860 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker Reporter: Hemanth Yamijala Assignee: rahul k singh MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the APIs related to queues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-860) Modify Queue APIs to support a hierarchy of queues
[ https://issues.apache.org/jira/browse/MAPREDUCE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752025#action_12752025 ] rahul k singh commented on MAPREDUCE-860: - This issue is being resolved as part of MAPREDUCE-861. Hence closing this as duplicate. Modify Queue APIs to support a hierarchy of queues -- Key: MAPREDUCE-860 URL: https://issues.apache.org/jira/browse/MAPREDUCE-860 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker Reporter: Hemanth Yamijala Assignee: rahul k singh MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the APIs related to queues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jothi Padmanabhan updated MAPREDUCE-157: Status: Patch Available (was: Open) Job History log file format is not friendly for external tools. --- Key: MAPREDUCE-157 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 0.20.1 Reporter: Owen O'Malley Assignee: Jothi Padmanabhan Fix For: 0.21.0 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-924) TestPipes must not directly invoke 'main' of pipes as an exit from main could cause the testcase to crash.
[ https://issues.apache.org/jira/browse/MAPREDUCE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-924: --- Description: TestPipes invokes the main method of the program running pipes. In MAPREDUCE-421, a change was made to the Pipes command runner to invoke System.exit after completion. This itself is a valid change because the pipes command runner is in itself a user facing program. But when combined with a testcase, it causes the testcase to crash rather than providing feedback on whether the test ran correctly or not. The testcase should be modified to use Tool instead of running main directly. was: TestPipes crashes on trunk due to MAPREDUCE-421. Testcase should be modified to use Tool insteadof running main directly. Summary: TestPipes must not directly invoke 'main' of pipes as an exit from main could cause the testcase to crash. (was: TestPipes crashes on trunk) TestPipes must not directly invoke 'main' of pipes as an exit from main could cause the testcase to crash. -- Key: MAPREDUCE-924 URL: https://issues.apache.org/jira/browse/MAPREDUCE-924 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.20.1 Attachments: patch-924-0.20.txt, patch-924.txt TestPipes invokes the main method of the program running pipes. In MAPREDUCE-421, a change was made to the Pipes command runner to invoke System.exit after completion. This itself is a valid change because the pipes command runner is in itself a user facing program. But when combined with a testcase, it causes the testcase to crash rather than providing feedback on whether the test ran correctly or not. The testcase should be modified to use Tool instead of running main directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752048#action_12752048 ] Ravi Gummadi commented on MAPREDUCE-956: We could say the phases as Shuffle phase and Reduce phase. But we need to investigate how we want to update progress in shuffle phase --- because updating progress of shuffle phase just based on 'copy of map outputs' would not be a correct way as there could be some merges that would take some time after all map outputs are copied to this reduce node(even though some merges happen while some map outputs are being copied). Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce) -- Key: MAPREDUCE-956 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.21.0 Reporter: Jothi Padmanabhan For the progress calculations and displaying on the UI, shuffle, in its current form, is decomposed into three phases (copy/sort/reduce). Actually, the sort phase is no longer applicable. I think we should just reduce the number of phases to two and assign 50% weight-age to each of copy and reduce phases. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control
[ https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-856: Status: Patch Available (was: Open) Localized files from DistributedCache should have right access-control -- Key: MAPREDUCE-856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Reporter: Arun C Murthy Assignee: Vinod K V Fix For: 0.21.0 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, MAPREDUCE-856-20090904.txt, MAPREDUCE-856-20090907.1.txt, MAPREDUCE-856-20090907.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control
[ https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-856: Attachment: MAPREDUCE-856-20090907.1.txt Updated patch fixing the test failures reported by Hudson. Localized files from DistributedCache should have right access-control -- Key: MAPREDUCE-856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: tasktracker Reporter: Arun C Murthy Assignee: Vinod K V Fix For: 0.21.0 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, MAPREDUCE-856-20090904.txt, MAPREDUCE-856-20090907.1.txt, MAPREDUCE-856-20090907.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-841) Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf objects
[ https://issues.apache.org/jira/browse/MAPREDUCE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752074#action_12752074 ] Devaraj Das commented on MAPREDUCE-841: --- BTW for the splits part, MAPREDUCE-181 (http://tinyurl.com/legzp9) is introducing some changes. Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf objects - Key: MAPREDUCE-841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.20.1 Reporter: Hong Tang Fix For: 0.21.0 JobTracker only needs to examine a subset of information contained by InputSplit or JobConf objects. But currently JobTracker loads the complete user-defined InputSplit and JobConf objects in memory. This design would leave JobTracker susceptible to memory exhaustion particularly in cases when some bugs in user code which could result in very large input splits or job conf objects (e.g. PIG-901). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-876) Sqoop import of large tables can time out
[ https://issues.apache.org/jira/browse/MAPREDUCE-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-876: Resolution: Fixed Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 I've just committed this. Thanks Aaron! Sqoop import of large tables can time out - Key: MAPREDUCE-876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-876 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-876.2.patch, MAPREDUCE-876.patch Related to MAPREDUCE-875, Sqoop should use a background thread to ensure that progress is being reported while a database does external work for the MapReduce task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-918) Test hsqldb server should be memory-only.
[ https://issues.apache.org/jira/browse/MAPREDUCE-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-918: Resolution: Fixed Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 I've just committed this. Thanks Aaron! Test hsqldb server should be memory-only. - Key: MAPREDUCE-918 URL: https://issues.apache.org/jira/browse/MAPREDUCE-918 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-918.patch Sqoop launches a standalone hsqldb server for unit tests, but it currently writes its database to disk and uses a connect string of {{//localhost}}. If multiple test instances are running concurrently, one test server may serve to the other instance of the unit tests, causing race conditions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-944) Extend FairShare scheduler to fair-share memory usage in the cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752105#action_12752105 ] Vinod K V commented on MAPREDUCE-944: - I see in the patch attached that only one concrete implementation CapBasedLoadManager is done for the LoadManager which in turn doesn't take into account any resource usage. I guess you are planning a proper implementation for this feature regarding fair-share of memory usage in another JIRA. Some points still not dealt with in this JIRA. I bring about these points so as to know if you are thinking or have already thought anything about this. - Job configuration about how users specify the resource usage. Some memory related configuration properties are added to the framework while working for memory monitoring on TTs as well as memory usage based scheduling in CapacityTaskScheduler. You may want to reuse some/all of it. - Capturing the scheduling decisions involved when we are not able to find a task from a Schedulable because of lack of resources on a given TaskTasker. Regarding the latter, the current patch just returns null, which is similar to the decision CapacityTaskScheduler used to take in previous versions - i.e. block the TT till it can be given a task from the job at the head of the queue/pool. Sometime back, we investigated how this approach works with FairScheduler and realized some important implications. For e.g, because the order of jobs might change significantly in consecutive iterations of FairScheduler, just returning null may not work at all. Eventually we may end up waiting for a long time if significant number of jobs ask for high amount of resources. Thoughts? Extend FairShare scheduler to fair-share memory usage in the cluster Key: MAPREDUCE-944 URL: https://issues.apache.org/jira/browse/MAPREDUCE-944 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: dhruba borthakur Attachments: LoadManager.txt The FairShare Scheduler has an extensible LoadManager API to regulate allocating new tasks on a particular TaskTracker. In similar lines, it would be nice if the FairShare Scheduler can have a pluggable policy to regulate new tasks from a particular job. This will allow one to skip scheduling tasks of a job that is eating a large percentage of memory in the cluster, i.e. fair-share of memory resources among jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-181) Secure job submission
[ https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752117#action_12752117 ] Devaraj Das commented on MAPREDUCE-181: --- For now, let's keep it simple - don't implement the points to do with maintaining/cleaning-up jobID-userName mappings. This should be looked at, in a bigger picture, once we have the authentication implemented. Also, rather than time-based expiry I think it would be better to have limits on number of queued jobs per user and the max queued jobs overall. Secure job submission -- Key: MAPREDUCE-181 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amar Kamat Assignee: Amar Kamat Attachments: hadoop-3578-branch-20-example-2.patch, hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole where the job files might get overwritten/tampered after the job submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-898) Change DistributedCache to use new api.
[ https://issues.apache.org/jira/browse/MAPREDUCE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752122#action_12752122 ] Hudson commented on MAPREDUCE-898: -- Integrated in Hadoop-Mapreduce-trunk-Commit #19 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/19/]) . Changes DistributedCache to use the new API. Contributed by Amareshwari Sriramadasu. Change DistributedCache to use new api. --- Key: MAPREDUCE-898 URL: https://issues.apache.org/jira/browse/MAPREDUCE-898 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: patch-898-1.txt, patch-898-2.txt, patch-898-3.txt, patch-898-4.txt, patch-898.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-918) Test hsqldb server should be memory-only.
[ https://issues.apache.org/jira/browse/MAPREDUCE-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752123#action_12752123 ] Hudson commented on MAPREDUCE-918: -- Integrated in Hadoop-Mapreduce-trunk-Commit #19 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/19/]) . Test hsqldb server should be memory-only. Contributed by Aaron Kimball. Test hsqldb server should be memory-only. - Key: MAPREDUCE-918 URL: https://issues.apache.org/jira/browse/MAPREDUCE-918 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-918.patch Sqoop launches a standalone hsqldb server for unit tests, but it currently writes its database to disk and uses a connect string of {{//localhost}}. If multiple test instances are running concurrently, one test server may serve to the other instance of the unit tests, causing race conditions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-764) TypedBytesInput's readRaw() does not preserve custom type codes
[ https://issues.apache.org/jira/browse/MAPREDUCE-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752182#action_12752182 ] Hudson commented on MAPREDUCE-764: -- Integrated in Hadoop-Mapreduce-trunk-Commit #20 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/20/]) . TypedBytesInput's readRaw() does not preserve custom type codes. Contributed by Klaas Bosteels. TypedBytesInput's readRaw() does not preserve custom type codes --- Key: MAPREDUCE-764 URL: https://issues.apache.org/jira/browse/MAPREDUCE-764 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: Klaas Bosteels Assignee: Klaas Bosteels Priority: Blocker Fix For: 0.21.0 Attachments: MAPREDUCE-764.patch, MAPREDUCE-764.patch The typed bytes format supports byte sequences of the form {{custom type code length bytes}}. When reading such a sequence via {{TypedBytesInput}}'s {{readRaw()}} method, however, the returned sequence currently is {{0 length bytes}} (0 is the type code for a bytes array), which leads to bugs such as the one described [here|http://dumbo.assembla.com/spaces/dumbo/tickets/54]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jothi Padmanabhan updated MAPREDUCE-157: Status: Open (was: Patch Available) Job History log file format is not friendly for external tools. --- Key: MAPREDUCE-157 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 0.20.1 Reporter: Owen O'Malley Assignee: Jothi Padmanabhan Fix For: 0.21.0 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jothi Padmanabhan updated MAPREDUCE-157: Attachment: mapred-157-7Sep-v1.patch Now, sqoop's ivy.xml needs to be updated too! Job History log file format is not friendly for external tools. --- Key: MAPREDUCE-157 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 0.20.1 Reporter: Owen O'Malley Assignee: Jothi Padmanabhan Fix For: 0.21.0 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep-v1.patch, mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-936) Allow a load difference in fairshare scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752230#action_12752230 ] Hudson commented on MAPREDUCE-936: -- Integrated in Hadoop-Mapreduce-trunk #75 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/]) Allow a load difference in fairshare scheduler -- Key: MAPREDUCE-936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-936 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.21.0 Attachments: MAPREDUCE-936.1.patch, MAPREDUCE-936.2.patch The problem we are facing: It takes a long time for all tasks of a job to get scheduled on the cluster, even if the cluster is almost empty. There are two reasons that together lead to this situation: 1. The load factor makes sure each TT runs the same number of tasks. (This is the part that this patch tries to change). 2. The scheduler tries to schedule map tasks locally (first node-local, then rack-local). There is a wait time (mapred.fairscheduler.localitywait.node and mapred.fairscheduler.localitywait.rack, both are around 10 sec in our conf), and accumulated wait time (JobInfo.localityWait). The accumulated wait time is reset to 0 whenever a non-local map task is scheduled. That means it takes N * wait_time to schedule N non-local map tasks. Because of 1, a lot of TT will not be able to take more tasks, even if they have free slots. As a result, a lot of the map tasks cannot be scheduled locally. Because of 2, it's really hard to schedule a non-local task. As a result, sometimes we are seeing that it takes more than 2 minutes to schedule all the mappers of a job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
[ https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752231#action_12752231 ] Hudson commented on MAPREDUCE-370: -- Integrated in Hadoop-Mapreduce-trunk #75 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/]) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api. --- Key: MAPREDUCE-370 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370-3.txt, patch-370-4.txt, patch-370-5.txt, patch-370.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
[ https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752232#action_12752232 ] Hudson commented on MAPREDUCE-372: -- Integrated in Hadoop-Mapreduce-trunk #75 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/]) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api. --- Key: MAPREDUCE-372 URL: https://issues.apache.org/jira/browse/MAPREDUCE-372 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: patch-372-1.txt, patch-372.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-903) Adding AVRO jar to eclipse classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752233#action_12752233 ] Hudson commented on MAPREDUCE-903: -- Integrated in Hadoop-Mapreduce-trunk #75 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/]) Adding AVRO jar to eclipse classpath Key: MAPREDUCE-903 URL: https://issues.apache.org/jira/browse/MAPREDUCE-903 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Philip Zeyliger Assignee: Philip Zeyliger Fix For: 0.21.0 Attachments: MAPREDUCE-903.patch Avro is missing from the eclipse classpath, which caused Eclipse to whine. Easy fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code
[ https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752234#action_12752234 ] Hudson commented on MAPREDUCE-318: -- Integrated in Hadoop-Mapreduce-trunk #75 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/]) Refactor reduce shuffle code Key: MAPREDUCE-318 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.21.0 Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch The reduce shuffle code has become very complex and entangled. I think we should move it out of ReduceTask and into a separate package (org.apache.hadoop.mapred.task.reduce). Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-945) Test programs support only default queue.
[ https://issues.apache.org/jira/browse/MAPREDUCE-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752252#action_12752252 ] Hadoop QA commented on MAPREDUCE-945: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418797/mapreduce-945-2.patch against trunk revision 812209. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/console This message is automatically generated. Test programs support only default queue. - Key: MAPREDUCE-945 URL: https://issues.apache.org/jira/browse/MAPREDUCE-945 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Suman Sehgal Attachments: mapreduce-945-1.patch, mapreduce-945-2.patch None of the test program seems to be supporting queue's concept. These programs looks for the default queue only even if some other queue is specified to run these programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752275#action_12752275 ] Hadoop QA commented on MAPREDUCE-861: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418777/MAPREDUCE-861-5.patch against trunk revision 812002. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/console This message is automatically generated. Modify queue configuration format and parsing to support a hierarchy of queues. --- Key: MAPREDUCE-861 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Hemanth Yamijala Assignee: rahul k singh Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the configuration related to queues. The current format for defining a queue and its properties is as follows: mapred.queue.queue-name.property-name. For e.g. mapred.queue.queue-name.acl-submit-job. The reason for using this verbose format was to be able to reuse the Configuration parser in Hadoop. However, administrators currently using the queue configuration have already indicated a very strong desire for a more manageable format. Since, this becomes more unwieldy with hierarchical queues, the time may be good to introduce a new format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.
[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752278#action_12752278 ] Hadoop QA commented on MAPREDUCE-157: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418824/mapred-157-7Sep-v1.patch against trunk revision 812209. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/console This message is automatically generated. Job History log file format is not friendly for external tools. --- Key: MAPREDUCE-157 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 0.20.1 Reporter: Owen O'Malley Assignee: Jothi Padmanabhan Fix For: 0.21.0 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep-v1.patch, mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch Currently, parsing the job history logs with external tools is very difficult because of the format. The most critical problem is that newlines aren't escaped in the strings. That makes using tools like grep, sed, and awk very tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-959) JobConf::setWorkingDirectory requires that the default FileSystem is reachable
JobConf::setWorkingDirectory requires that the default FileSystem is reachable -- Key: MAPREDUCE-959 URL: https://issues.apache.org/jira/browse/MAPREDUCE-959 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, test Reporter: Chris Douglas Priority: Minor If mapred.working.dir is not set, JobConf::setWorkingDirectory will attempt to obtain the default working directory for the default FileSystem. In trunk at least, if the default fs is HDFS and not reachable, set will fail: {noformat} java.net.UnknownHostException: unknown host: notahost java.lang.RuntimeException: java.net.UnknownHostException: unknown host: notahost at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:541) at org.apache.hadoop.mapred.JobConf.setWorkingDirectory(JobConf.java:522) at org.apache.hadoop.conf.TestJobConf.testSetWorkingDir(TestJobConf.java:67) Caused by: java.net.UnknownHostException: unknown host: notahost at org.apache.hadoop.ipc.Client$Connection.init(Client.java:216) at org.apache.hadoop.ipc.Client.getConnection(Client.java:876) at org.apache.hadoop.ipc.Client.call(Client.java:746) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223) at $Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:366) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:169) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:276) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:235) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:83) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1430) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1458) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1446) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:190) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:98) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:537) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader Key: MAPREDUCE-960 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Chris Douglas Assignee: Chris Douglas KeyValueLineRecordReader effects the copy from the line to the key/value by creating separate arrays: {noformat} int keyLen = pos; byte[] keyBytes = new byte[keyLen]; System.arraycopy(line, 0, keyBytes, 0, keyLen); int valLen = lineLen - keyLen - 1; byte[] valBytes = new byte[valLen]; System.arraycopy(line, pos + 1, valBytes, 0, valLen); key.set(keyBytes); value.set(valBytes); {noformat} Since set triggers another copy and Text has a set taking {{byte[], off, len}}, the intermediate copy can be avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-960: Attachment: M960-0.patch Removed intermediate buffer and {{KeyValueLineRecordReader::getKeyClass}} accidentally copied from mapred in MAPREDUCE-655 Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader Key: MAPREDUCE-960 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Chris Douglas Assignee: Chris Douglas Attachments: M960-0.patch KeyValueLineRecordReader effects the copy from the line to the key/value by creating separate arrays: {noformat} int keyLen = pos; byte[] keyBytes = new byte[keyLen]; System.arraycopy(line, 0, keyBytes, 0, keyLen); int valLen = lineLen - keyLen - 1; byte[] valBytes = new byte[valLen]; System.arraycopy(line, pos + 1, valBytes, 0, valLen); key.set(keyBytes); value.set(valBytes); {noformat} Since set triggers another copy and Text has a set taking {{byte[], off, len}}, the intermediate copy can be avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-960: Status: Patch Available (was: Open) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader Key: MAPREDUCE-960 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Chris Douglas Assignee: Chris Douglas Attachments: M960-0.patch KeyValueLineRecordReader effects the copy from the line to the key/value by creating separate arrays: {noformat} int keyLen = pos; byte[] keyBytes = new byte[keyLen]; System.arraycopy(line, 0, keyBytes, 0, keyLen); int valLen = lineLen - keyLen - 1; byte[] valBytes = new byte[valLen]; System.arraycopy(line, pos + 1, valBytes, 0, valLen); key.set(keyBytes); value.set(valBytes); {noformat} Since set triggers another copy and Text has a set taking {{byte[], off, len}}, the intermediate copy can be avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-830: Attachment: M830-3.patch * Fixed mapreduce.lib.input.LineRecordReader (I missed the filePosition updates in the last patch) * Added a unit test for the mapreduce code * Patched KeyValueLineRecordReader::isSplittable in mapred and mapreduce Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752304#action_12752304 ] Chris Douglas commented on MAPREDUCE-830: - (also includes a workaround for MAPREDUCE-959, which was getting irritating, and updates the unit tests to JUnit4 semantics) Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-960: Affects Version/s: 0.21.0 Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader Key: MAPREDUCE-960 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Chris Douglas Assignee: Chris Douglas Attachments: M960-0.patch KeyValueLineRecordReader effects the copy from the line to the key/value by creating separate arrays: {noformat} int keyLen = pos; byte[] keyBytes = new byte[keyLen]; System.arraycopy(line, 0, keyBytes, 0, keyLen); int valLen = lineLen - keyLen - 1; byte[] valBytes = new byte[valLen]; System.arraycopy(line, pos + 1, valBytes, 0, valLen); key.set(keyBytes); value.set(valBytes); {noformat} Since set triggers another copy and Text has a set taking {{byte[], off, len}}, the intermediate copy can be avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752322#action_12752322 ] Hadoop QA commented on MAPREDUCE-960: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418866/M960-0.patch against trunk revision 812287. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/43/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/43/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/43/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/43/console This message is automatically generated. Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader Key: MAPREDUCE-960 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Chris Douglas Assignee: Chris Douglas Attachments: M960-0.patch KeyValueLineRecordReader effects the copy from the line to the key/value by creating separate arrays: {noformat} int keyLen = pos; byte[] keyBytes = new byte[keyLen]; System.arraycopy(line, 0, keyBytes, 0, keyLen); int valLen = lineLen - keyLen - 1; byte[] valBytes = new byte[valLen]; System.arraycopy(line, pos + 1, valBytes, 0, valLen); key.set(keyBytes); value.set(valBytes); {noformat} Since set triggers another copy and Text has a set taking {{byte[], off, len}}, the intermediate copy can be avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-28) TestQueueManager takes too long and times out some times
[ https://issues.apache.org/jira/browse/MAPREDUCE-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752327#action_12752327 ] Sreekanth Ramakrishnan commented on MAPREDUCE-28: - After discussion with Rahul and looking at the test case which were written for MAPREDUCE-861, the path forward would be to just test the sematic meaning of the configured acls in the {{TestQueueManager}} the state and acl refresh is actually taken care in the test case introduced in {{MAPREDUCE-861}} TestQueueManager takes too long and times out some times Key: MAPREDUCE-28 URL: https://issues.apache.org/jira/browse/MAPREDUCE-28 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amareshwari Sriramadasu TestQueueManager takes long time for the run and timeouts sometimes. See the failure at http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3875/testReport/. Looking at the console output, before the test finsihes, it was timed-out. On my machine, the test takes about 5 minutes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-944) Extend FairShare scheduler to fair-share memory usage in the cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated MAPREDUCE-944: --- Attachment: LoadManager2.txt Incorporated Matie's review comments. Vinod: The goal that we have in mind is slightly different from what the capacity scheduler has done (pl correct me if I am wrong). Unlike the capacity scheduler, there is no assumption that the user knows (upfront, before submitting job) how much memory/CPU/network that job will need. A realtime stream of resource usage will be fed into the new LoadManager that can then dynamically decide whether to run another task on that machine or not. Your feedback and experience vis-a-vis that Capacity scheduler is very valuable here... let's continue the conversation via MAPREDUCE-961 Extend FairShare scheduler to fair-share memory usage in the cluster Key: MAPREDUCE-944 URL: https://issues.apache.org/jira/browse/MAPREDUCE-944 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: dhruba borthakur Attachments: LoadManager.txt, LoadManager2.txt The FairShare Scheduler has an extensible LoadManager API to regulate allocating new tasks on a particular TaskTracker. In similar lines, it would be nice if the FairShare Scheduler can have a pluggable policy to regulate new tasks from a particular job. This will allow one to skip scheduling tasks of a job that is eating a large percentage of memory in the cluster, i.e. fair-share of memory resources among jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated MAPREDUCE-961: --- Tags: fb ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s) --- Key: MAPREDUCE-961 URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/fair-share Reporter: dhruba borthakur Assignee: dhruba borthakur Design and develop a ResouceAwareLoadManager for the FairShare scheduler that dynamically decides how many maps/reduces to run on a particular machine based on the CPU/Memory/diskIO/network usage in that machine. The amount of resources currently used on each task tracker is being fed into the ResourceAwareLoadManager in real-time via an entity that is external to Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-944) Extend FairShare scheduler to fair-share memory usage in the cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated MAPREDUCE-944: --- Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) Extend FairShare scheduler to fair-share memory usage in the cluster Key: MAPREDUCE-944 URL: https://issues.apache.org/jira/browse/MAPREDUCE-944 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/fair-share Reporter: dhruba borthakur Fix For: 0.21.0 Attachments: LoadManager.txt, LoadManager2.txt The FairShare Scheduler has an extensible LoadManager API to regulate allocating new tasks on a particular TaskTracker. In similar lines, it would be nice if the FairShare Scheduler can have a pluggable policy to regulate new tasks from a particular job. This will allow one to skip scheduling tasks of a job that is eating a large percentage of memory in the cluster, i.e. fair-share of memory resources among jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.
[ https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752351#action_12752351 ] Hemanth Yamijala commented on MAPREDUCE-861: This is getting really close, sans the issues with test-patch. - I still think QueueManager needs more java documentation - particularly explaining some aspects of hierarchical queues. - Also, the mapred-queues.xml can be better presented. - Not 100% sure about this, but will LOG.fatal cause an exception to be thrown ? If yes, can you check if that's the intended usage where you are using it ? - ACLs are stored inconsistently between Deprecated and hierarchical configuration. Specifically in the hierarchical configuration, they should be stored with the same key as in the hierarchical case, as the QueueManager treats them equally. Modify queue configuration format and parsing to support a hierarchy of queues. --- Key: MAPREDUCE-861 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Hemanth Yamijala Assignee: rahul k singh Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce framework. This JIRA is for defining changes to the configuration related to queues. The current format for defining a queue and its properties is as follows: mapred.queue.queue-name.property-name. For e.g. mapred.queue.queue-name.acl-submit-job. The reason for using this verbose format was to be able to reuse the Configuration parser in Hadoop. However, administrators currently using the queue configuration have already indicated a very strong desire for a more manageable format. Since, this becomes more unwieldy with hierarchical queues, the time may be good to introduce a new format for representing queue configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.