[jira] Commented: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897159#action_12897159 ] Vinod K V commented on MAPREDUCE-1780: -- Some comments: - JobSubmittedEvent is used to persist job-acls to JobHistory but the acls are incorrectly written through toString() method. Please add a test/modify the existing test in TestJobHistory to verify this bug. - Minor: Not directly related to the patch, but can fix it here. In QueueManger.dumpConfiguration(), we don't need aclsSubmitJobValue to be a StringBuilder. We can drop off getAclsInfo() method itself. AccessControlList.toString() is used for serialization of ACL in JobStatus.java --- Key: MAPREDUCE-1780 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: 1780.patch HADOOP-6715 is created to fix AccessControlList.toString() for the case of WILDCARD. JobStatus.write() and readFields() assume that toString() returns the serialized String of AccessControlList object, which is not true. Once HADOOP-6715 gets fixed in COMMON, JobStatus.write() and JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897178#action_12897178 ] Luke Lu commented on MAPREDUCE-1881: I have no issue with the statusUpdate method. I got where you're coming from :) But I question many users will want to do the same thing. I'm curious about many useful instrumentation classes being written. Adding features (especially redundant ones), IMO, doesn't necessarily make Hadoop better but rather bloated and harder to maintain. You know, perfection is attained not when no more can be added, but when no more can be removed. Another thing about the patch is that if the instrumentation class is specified as an empty string, it silently defaults to the composite class with a empty list (essentially a noop instrumentation), which is a behavior change from the existing behavior: an exception would be thrown. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1780: Attachment: 1780.v1.patch Attaching patch incorporating review comments. Validation of job acls that are logged to history file is added now to TestJobHistory. This somehow missed from trunk's patch of MAPREDUCE-1493, which was there in Y! dist patch. AccessControlList.toString() is used for serialization of ACL in JobStatus.java --- Key: MAPREDUCE-1780 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: 1780.patch, 1780.v1.patch HADOOP-6715 is created to fix AccessControlList.toString() for the case of WILDCARD. JobStatus.write() and readFields() assume that toString() returns the serialized String of AccessControlList object, which is not true. Once HADOOP-6715 gets fixed in COMMON, JobStatus.write() and JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897188#action_12897188 ] Amareshwari Sriramadasu commented on MAPREDUCE-1920: Tests that timed out till now: TestAdminOperationsProtocolWithServiceAuthorization TestClusterMRNotification TestDebugScript TestEmptyJob TestIsolationRunner TestJobCleanup TestJobHistory TestJobHistoryParsing TestJobInProgress TestJobInProgressListener TestJobKillAndFail TestJobQueueClient TestJvmReuse TestKillSubProcesses TestMRWithDistributedCache TestMapredHeartbeat TestMiniMRBringup Tests that failed: TestJobTrackerStart TestKillCompletedJob my local ant test run is still running. So, more tests to be added to the above list. Shall we fix MiniMRCluster to set a persist dir in local file system if fileSystem passed is local, instead of fixing these individual tests? Or shall we disable completed job store for the unit tests by adding conf in src/test/mapred-site.xml (similar to disabling retire jobs) as TestJobStatusPersistency anyways tests the functionality of completedJobStore? Job.getCounters() returns null when using a cluster --- Key: MAPREDUCE-1920 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Reporter: Aaron Kimball Assignee: Tom White Priority: Critical Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, MAPREDUCE-1920.patch Calling Job.getCounters() after the job has completed (successfully) returns null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1979) Output directory already exists error in gridmix when gridmix.output.directory is not defined
[ https://issues.apache.org/jira/browse/MAPREDUCE-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1979: Attachment: 1979.v1.patch With earlier patch, TestGridmixSubmission was failing. Attaching new patch with the correct fix. Also added testcase. Output directory already exists error in gridmix when gridmix.output.directory is not defined --- Key: MAPREDUCE-1979 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1979 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: 1979.patch, 1979.v1.patch Output directory already exists error is seen in gridmix when gridmix.output.directory is not defined. When gridmix.output.directory is not defined, then gridmix uses inputDir/gridmix/ as output path for gridmix run. Because gridmix is creating outputPath(in this case, inputDir/gridmix/) at the begining, the output path to generate-data-mapreduce-job(i.e. inputDir) already exists and becomes error from mapreduce. There is no need of creating this outputPath in any case(whether user specifies the path using gridmix.output.directory OR gridmix itself considering inputDir/gridmix/ ) because the paths are automatically created for output paths of mapreduce jobs(like mkdir -p). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1959) Should use long name for token renewer on the client side
[ https://issues.apache.org/jira/browse/MAPREDUCE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897193#action_12897193 ] Hadoop QA commented on MAPREDUCE-1959: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12451719/m1959-02.patch against trunk revision 983815. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/console This message is automatically generated. Should use long name for token renewer on the client side - Key: MAPREDUCE-1959 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1959 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, security Reporter: Kan Zhang Assignee: Kan Zhang Attachments: m1959-01.patch, m1959-02.patch When getting a delegation token from a NN, a client needs to specify the renewer for the token. For use on a MapRed cluster, JT should be specified as the renewer. However, in the current code, the client maps JT's long name (Kerberos principal name) to cluster-internal short name and then sets the short name as the renewer. This is undesirable for 2 reasons. 1) It's unnecessary since NN (or JT) converts client-supplied renewer from long to short name anyway. 2) In principle, the mapping from long to short name should be done on the server. This is consistent with the authentication case, where the client uses the same long name to authenticate to multiple servers and servers map client's long name to their own internal short names. It facilitates using the same job client to get delegation tokens from multiple NN's, which may have different mapping rules for JT. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897200#action_12897200 ] Vinod K V commented on MAPREDUCE-1780: -- Looks good, +1 for the patch. AccessControlList.toString() is used for serialization of ACL in JobStatus.java --- Key: MAPREDUCE-1780 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: 1780.patch, 1780.v1.patch HADOOP-6715 is created to fix AccessControlList.toString() for the case of WILDCARD. JobStatus.write() and readFields() assume that toString() returns the serialized String of AccessControlList object, which is not true. Once HADOOP-6715 gets fixed in COMMON, JobStatus.write() and JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2003) It should be able to specify different jvm settings for map and reduce child process (via mapred.child.map.java.opts and mapred.child.reduce.java.opts options)
It should be able to specify different jvm settings for map and reduce child process (via mapred.child.map.java.opts and mapred.child.reduce.java.opts options) Key: MAPREDUCE-2003 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2003 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Reporter: Vladimir Klimontovich Fix For: 0.20.3, 0.21.0, 0.22.0 Sometimes mapper child process requires different JVM settings than reducer. For example when mapper requires much more memory than reducer. Now it's only possible to set options for both using mapred.child.java.opts. Proposed solution: mapred.child.java.opts could be overwritten by mapred.child.map.java.opts or mapred.child.reduce.java.opts. Thus, we're adding more flexibility and compatibility with old configurations. The same should be done for mapred.child.env. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897328#action_12897328 ] Arun C Murthy commented on MAPREDUCE-220: - Scott, sorry for coming in late. I have a nit: we seem to create a new ProcfsBasedProcessTree each time - wouldn't it be easier to re-use the object? Create it once and re-use it each time? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897330#action_12897330 ] Tom White commented on MAPREDUCE-1920: -- Or shall we disable completed job store for the unit tests by adding conf in src/test/mapred-site.xml (similar to disabling retire jobs) as TestJobStatusPersistency anyways tests the functionality of completedJobStore? I think this is a much better way of doing it. Thanks for the suggestion. I'll prepare a patch. Job.getCounters() returns null when using a cluster --- Key: MAPREDUCE-1920 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Reporter: Aaron Kimball Assignee: Tom White Priority: Critical Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, MAPREDUCE-1920.patch Calling Job.getCounters() after the job has completed (successfully) returns null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897334#action_12897334 ] Philip Zeyliger commented on MAPREDUCE-1881: I'll chime in that I'm using the instrumentation classes and find them a useful way to listen to some events that are otherwise hard to get at. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2004) IP address vs host name in updating Counter.DATA_LOCAL_MAPS
IP address vs host name in updating Counter.DATA_LOCAL_MAPS --- Key: MAPREDUCE-2004 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2004 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.2 Reporter: Rares Vernica Priority: Minor Hello, I set mapred.task.cache.levels to 1 so that I have only data-local-map tasks. Still, by looking the the data-local-maps counter it seems not all map tasks are local. I checked each map task to see where it run and what split has been assigned to it and all the maps were actually processing only local data. (BTW, replication was set to 1.) I looked into the JobClient so see what information is there for each split. For each file, the first n-1 splits have an IP address as location while the n-th split has a host name as location. The reason for this is that there is a different code path in deciding the location for the first n-1 splits versus the n-th split. The maps that processed the splits where the location was a host name were counted as data-local-maps while the others were not. So, regardless of the fact that the JobClient gives IP or host names for splits the job works fine. The problem is that the data-local-maps counter does not take this into consideration. Cheers, Rares -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897348#action_12897348 ] Arun C Murthy commented on MAPREDUCE-1881: -- I'm trying to understand the proposal... please help me. Currently you can define multiple 'sinks' for the same data via CompositeContext. Thus you can define multiple listeners and each will get the same data, is that sufficient for this use case? Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897367#action_12897367 ] Luke Lu commented on MAPREDUCE-1881: The instrumentation class is related to but not dependent on metrics frameworks. Some of the events are actually not collected in the regular metrics, so there is an expert level config property mapreduce.tasktracker.instrumentation to specify a subclass for TaskTrackerInstrumentation which contains all the overridable callbacks. The default value for the property is the TaskTrackerMetricsInst class which currently implements the Updater interface to collect tasktracker metrics in the mapred metrics context. Similarly for metrics v2, TaskTrackerMetricsSource would be the default. Matei and others want to use the overridable instrumentation property to hook in other listeners, for things that're not strictly metrics related, like statusUpdate, which is useful for his project which does two-level scheduling :) He can achieve this with the addition of the statusUpdate method in TaskTrackerInstrumentation. To make adding more instrumentation classes (while preserving the existing instrumentation like metrics) slightly easier (IMO, a user defined composite class is just as easy), he wants to make the property a list of classes so that the events are fired for each instances of the specified classes. The latter part of the patch would add a composite instrumentation class that dispatches all the events to all the instances of the specified instrumentation classes. Currently the patch lacks unit tests for the composite class. I can see problems down the road maintaining the class, like making sure it doesn't block in one of the classes that can potentially do RPCs etc and properly handle exceptions in the delegate objects. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897380#action_12897380 ] Hong Tang commented on MAPREDUCE-1253: -- I have reviewed Anirban's earlier and I forgot to comment with +1. Making Mumak work with Capacity-Scheduler - Key: MAPREDUCE-1253 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/mumak Affects Versions: 0.21.0, 0.22.0 Reporter: Anirban Dasgupta Assignee: Anirban Dasgupta Attachments: MAPREDUCE-1253-20100406.patch, MAPREDUCE-1253-20100726-2.patch, MAPREDUCE-1253-20100804.patch Original Estimate: 672h Remaining Estimate: 672h In order to make the capacity-scheduler work in the mumak simulation environment, we have to replace the job-initialization threads of the capacity scheduler with classes that perform event-based initialization. We propose to use aspectj to disable the threads of the JobInitializationPoller class used by the Capacity Scheduler, and then perform the corresponding initialization tasks through a simulation job-initialization class that receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1253: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed +1 for the patch. I just committed this. ant test for mumak pass. Making Mumak work with Capacity-Scheduler - Key: MAPREDUCE-1253 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/mumak Affects Versions: 0.21.0, 0.22.0 Reporter: Anirban Dasgupta Assignee: Anirban Dasgupta Attachments: MAPREDUCE-1253-20100406.patch, MAPREDUCE-1253-20100726-2.patch, MAPREDUCE-1253-20100804.patch Original Estimate: 672h Remaining Estimate: 672h In order to make the capacity-scheduler work in the mumak simulation environment, we have to replace the job-initialization threads of the capacity scheduler with classes that perform event-based initialization. We propose to use aspectj to disable the threads of the JobInitializationPoller class used by the Capacity Scheduler, and then perform the corresponding initialization tasks through a simulation job-initialization class that receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1253: - Fix Version/s: 0.22.0 Making Mumak work with Capacity-Scheduler - Key: MAPREDUCE-1253 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/mumak Affects Versions: 0.21.0, 0.22.0 Reporter: Anirban Dasgupta Assignee: Anirban Dasgupta Fix For: 0.22.0 Attachments: MAPREDUCE-1253-20100406.patch, MAPREDUCE-1253-20100726-2.patch, MAPREDUCE-1253-20100804.patch Original Estimate: 672h Remaining Estimate: 672h In order to make the capacity-scheduler work in the mumak simulation environment, we have to replace the job-initialization threads of the capacity scheduler with classes that perform event-based initialization. We propose to use aspectj to disable the threads of the JobInitializationPoller class used by the Capacity Scheduler, and then perform the corresponding initialization tasks through a simulation job-initialization class that receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-2005) TestDelegationTokenRenewal fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik reassigned MAPREDUCE-2005: - Assignee: Boris Shkolnik TestDelegationTokenRenewal fails Key: MAPREDUCE-2005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2005 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: MAPREDUCE-2005-YH20.patch looks like the problem is in host resolution. test is using localhost:0, but in DelegationTokenRenewal we use getCannonicalName() for localhost, and on some machine it is not localhost Fix - change test to use getCannonicalName too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2005) TestDelegationTokenRenewal fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated MAPREDUCE-2005: -- Attachment: MAPREDUCE-2005-YH20.patch for previous version, not for commit I've also updated some comments and debug lines TestDelegationTokenRenewal fails Key: MAPREDUCE-2005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2005 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik Attachments: MAPREDUCE-2005-YH20.patch looks like the problem is in host resolution. test is using localhost:0, but in DelegationTokenRenewal we use getCannonicalName() for localhost, and on some machine it is not localhost Fix - change test to use getCannonicalName too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-152) getMapOutput() keeps failing too many times before the tasktracker fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897395#action_12897395 ] Krishna Ramachandran commented on MAPREDUCE-152: It has been more than 2 years. Is this still an issue? getMapOutput() keeps failing too many times before the tasktracker fails Key: MAPREDUCE-152 URL: https://issues.apache.org/jira/browse/MAPREDUCE-152 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yiping Han Priority: Critical We are running a big job on our cluster. There are about 400 reducers. Around 361 reducers finished successfully while the last batch of 39 reducers all failed roughly around the same time. After examining the log files, the following error info was found 858 times for a single tasktracker: 2008-04-21 02:42:45,368 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804101742_0001_m_032077_2,396) failed : 2008-04-21 02:42:49,468 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804101742_0001_m_032077_2,396) failed : 2008-04-21 02:43:03,717 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804101742_0001_m_032077_2,396) failed : Shouldn't the task tracker failed early without trying so many times? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-223) JobClient should work with -1/+1 version of JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897407#action_12897407 ] Krishna Ramachandran commented on MAPREDUCE-223: it has been sitting for over 2 years and I do not believe anything has changed. hdfs I believe provide read only interface for listing/retrieving data over http. Is this still critical - to have similar interface to embedded JT http server on top of what the web interface already provides (for accessing task or job logs?) JobClient should work with -1/+1 version of JobTracker -- Key: MAPREDUCE-223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-223 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: all Reporter: Alejandro Abdelnur Priority: Critical Currently there is version check on the RPC calls that enforces the same Hadoop version on the client and the server. To enable phased upgrades of systems using Hadoop and Hadoop itself the {{JobClient}} should be able to interact with a {{JobTracker}} of the previous and the next version of Hadoop (or with a range). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1920: - Attachment: MAPREDUCE-1920.patch This patch (based on the first one) sets mapreduce.jobtracker.persist.jobstatus.active to false in the test mapred-site.xml. It passes all unit tests (I ran it on Linux). Here's the output of test-patch: {noformat} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] {noformat} Job.getCounters() returns null when using a cluster --- Key: MAPREDUCE-1920 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Reporter: Aaron Kimball Assignee: Tom White Priority: Critical Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, MAPREDUCE-1920.patch Calling Job.getCounters() after the job has completed (successfully) returns null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1980) TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-1980: -- Attachment: mapreduce-1980-v1.0.patch Attaching a patch the fixes the bug. test-patch and ant-tests passed on my box. TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks -- Key: MAPREDUCE-1980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1980 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amar Kamat Assignee: Amar Kamat Attachments: mapreduce-1980-v1.0.patch TaskAttemptUnsuccessfulCompletionEvent is used to log unsuccessful map and reduce task attempts to JobHistory. Following is the implementation of getEventType() method of TaskAttemptUnsuccessfulCompletionEvent /** Get the event type */ public EventType getEventType() { return EventType.MAP_ATTEMPT_KILLED; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1980) TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897427#action_12897427 ] Hong Tang commented on MAPREDUCE-1980: -- Patch looks good. +1. TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks -- Key: MAPREDUCE-1980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1980 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amar Kamat Assignee: Amar Kamat Attachments: mapreduce-1980-v1.0.patch TaskAttemptUnsuccessfulCompletionEvent is used to log unsuccessful map and reduce task attempts to JobHistory. Following is the implementation of getEventType() method of TaskAttemptUnsuccessfulCompletionEvent /** Get the event type */ public EventType getEventType() { return EventType.MAP_ATTEMPT_KILLED; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897442#action_12897442 ] Luke Lu commented on MAPREDUCE-1881: The jobtracker and tasktracker instrumentation is introduced in HADOOP-3772, which contains more background info. Improve TaskTrackerInstrumentation -- Key: MAPREDUCE-1881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, mapreduce-1881.patch The TaskTrackerInstrumentation class provides a useful way to capture key events at the TaskTracker for use in various reporting tools, but it is currently rather limited, because only one TaskTrackerInstrumentation can be added to a given TaskTracker and this objects receives minimal information about tasks (only their IDs). I propose enhancing the functionality through two changes: # Support a comma-separated list of TaskTrackerInstrumentation classes rather than just a single one in the JobConf, and report events to all of them. # Make the reportTaskLaunch and reportTaskEnd methods in TaskTrackerInstrumentation receive a reference to a whole Task object rather than just its TaskAttemptID. It might also be useful to make the latter receive the task's final state, i.e. failed, killed, or successful. I'm just posting this here to get a sense of whether this is a good idea. If people think it's okay, I will make a patch against trunk that implements these changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated MAPREDUCE-1981: - Attachment: mapredListFiles2.patch Now HDFS-202 is in, mapredListFiles2.patch is the last piece of code that completes the improvement of getSplits performance. Could a warm heart give it a review? Thanks. Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: mapredListFiles.patch, mapredListFiles1.patch, mapredListFiles2.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated MAPREDUCE-1981: - Status: Patch Available (was: Open) Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: mapredListFiles.patch, mapredListFiles1.patch, mapredListFiles2.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897492#action_12897492 ] Scott Chen commented on MAPREDUCE-220: -- Thanks, Arun. I will update the patch soon. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-220: - Attachment: MAPREDUCE-220-20100811.txt Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-220: - Status: Open (was: Patch Available) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-220: - Status: Patch Available (was: Open) Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897501#action_12897501 ] Scott Chen commented on MAPREDUCE-220: -- Update to address Arun's comment. Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897509#action_12897509 ] Eli Collins commented on MAPREDUCE-220: --- Caching the process tree this way works with JVM re-use? Collecting cpu and memory usage for MapReduce tasks --- Key: MAPREDUCE-220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task, tasktracker Reporter: Hong Tang Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt It would be nice for TaskTracker to collect cpu and memory usage for individual Map or Reduce tasks over time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1496) org.apache.hadoop.mapred.lib.FieldSelectionMapReduce removes empty fields from key/value end
[ https://issues.apache.org/jira/browse/MAPREDUCE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897515#action_12897515 ] Krishna Ramachandran commented on MAPREDUCE-1496: - can you provide more details? say for example your key fields have a at the end? like map.output.key.value.fields.spec, 6 ,5,1-3:0- instead of map.output.key.value.fields.spec, 6,5,1-3:0- org.apache.hadoop.mapred.lib.FieldSelectionMapReduce removes empty fields from key/value end Key: MAPREDUCE-1496 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1496 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Maxim Zizin Priority: Critical If input record's key and/or value has empty fields in the end then these fields will be cut off by org.apache.hadoop.mapred.lib.FieldSelectionMapReduce -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2002) MRUnit driver classes should provide ability to set a configuration object to be passed into the mapper/reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897523#action_12897523 ] Aaron Kimball commented on MAPREDUCE-2002: -- Does this duplicate MAPREDUCE-1569? MRUnit driver classes should provide ability to set a configuration object to be passed into the mapper/reducer --- Key: MAPREDUCE-2002 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2002 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mrunit Affects Versions: 0.20.2 Reporter: David Rosenstrauch Priority: Minor Short description: Enhance the org.apache.hadoop.mrunit.mapreduce.MapDriver, ReduceDriver, and MapReduceDriver unit test driver classes to contain setConfiguration and withConfiguration methods for passing in user-supplied org.apache.hadoop.conf.Configuration objects, and have those configuration objects eventually get passed on to the Context objects that are passed in to the mapper/reducer setup methods. (Rather than passing in an empty Configuration object, as is being done now.) Long description: The MRUnit driver classes (i.e., MapDriver, ReduceDriver, and MapReduceDriver) ought to be enhanced to contain methods for setting a Configuration object to be used by the mapper/reducer being tested - i.e., setConfiguration() and withConfiguration(). The only way to effectively pass parameters into a mapper or reducer is by setting properties on a configuration object, which the mapper/reducer can then retrieve in their setup step, and use to customize its operation. As a result, specific mappers/reducers may require the presence of specific configuration properties/parameters in order to function correctly (or at all). (I am currently coding such a reducer right now.) Testing such a mapper/reducer thus requires that the unit testing framework used provide the ability to pass in user-supplied Configuration objects to them so that they can be tested with appropriate parameter values. However, MRUnit currently does not provide this ability. (All mappers/reducers are always passed an empty configuration object.) And there is not even currently any (easy) way for the end-user to fix this problem by creating a simple sub-class that supplies this functionality, as such subclasses would require a substantial reimplementation/override of several MRUnit framework classes. I believe this something that is not too difficult to fix in the MRUnit framework code, however, and would greatly help the usability of MRUnit. Although I don't have time to code this enhancement right now, if needed/preferred I could squeeze out some time to code up a patch for this. If that's needed, please let me know. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1118) Capacity Scheduler scheduling information is hard to read / should be tabular format
[ https://issues.apache.org/jira/browse/MAPREDUCE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897532#action_12897532 ] Dick King commented on MAPREDUCE-1118: -- This comment is a review. First, let me say that I didn't review {{sorttable.js}} . It would be bad to have subtly different versions of this code flying around. {{CapacitySchedulerServlet.java}} near end of {{doGet()}} : *This is serious*: {{ByteArrayOutputSteam.writeTo(OutputStream)}} throws. Please revise this call to something like {noformat} OutputStream servletOut = null; try { servletOut = response.getOutputStream(); baos.writeTo(servletOut); } finally { if (servletOut != null) { servletOut.close(); } } {noformat} . *This is semi-serious*: In {{showQueues}} , where queues are printed, the code {noformat} out.printf( tda href=\jobqueue_details.jsp?queueName=%s\%s/a/td\n, name, name); {noformat} the code deposits the name right in the middle of hard-core HTML. If the queue names contain obnoxious characters such as a quote or an angle bracket we could have a bad day. These characters should be escaped with HTML escape sequences such as {{lt;}} , etc. Don't forget to escape the ampersands :-) . I believe that only quote marks and angle brackets need to be escaped in the URL, but everything needs to be escaped in the rendered text. *This is a nit*: In {noformat} out.printf(td%s/td\n, queuesManager.getJobQueue(name) .getRunningJobs().size()); out.printf(td%s/td\n, qsc.getNumOfWaitingJobs()); {noformat} I can't condone dropping numeric data onto a {{%s}} . I realize that it works but it looks ugly to my eye. *This is potentially serious*: I don't see where {{showQueues}} does the needed locking. You allude to this by defensively dumping into a {{ByteArrayOutputStream}} , but the code doesn't lock anything. I can see why it should. Can queues disappear or appear? *This is a potential omission*: The block comment before the {{class}} declaration claims to implement an advanced mode, but I don't see any footprint of such a thing in the code. In any event, I'm not a big fan of magic URLs. The servlet should include a button to bring itself into advanced mode. If there are users that shouldn't be able to go into advanced mode, this should be handled in some other manner than hidden URLs. I don't see the code to get into the scheduler manager servlet. Perhaps there should be a button in the job tracker administration page when the capacity scheduler is in use? {{TaskSchedulingMgr}} {{infoServer.setAttribute(scheduler, this);}} *This is a nit*: I would prefer {{infoServer.setAttribute(scheduler.scheduler, this);}} . All of the servlets share an attribute namespace. However, this one isn't bad as such things go, since it's hard to imagine another servlet code author putting anything except the ambient scheduler into that attribute. {{TestCapacitySchedulerServlet}} This is a minor nit. I can't condone {{assertTrue(queueData.contains(50.0%));}} . That's the moral equivalent of floating point equality. I do realize that 1/2 can be represented exactly in most float systems, but you might want to do something else, even if only allowing the value to be {{49.9}} which is okay because the servlet does print it out as a {{%.1f}} . Capacity Scheduler scheduling information is hard to read / should be tabular format Key: MAPREDUCE-1118 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1118 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Allen Wittenauer Assignee: Krishna Ramachandran Attachments: mapred-1118-1.patch, mapred-1118-2.patch, mapred-1118.20S.patch, mapred-1118.patch The scheduling information provided by the capacity scheduler is extremely hard to read on the job tracker web page. Instead of just flat text, it should be presenting the information in a tabular format, similar to what the fair share scheduler provides. This makes it much easier to compare what different queues are doing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1598) Wrongly configured 'hadoop.job.history.user.location' can cause jobs to be pinned in JobTracker's memory forever
[ https://issues.apache.org/jira/browse/MAPREDUCE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897535#action_12897535 ] Dick King commented on MAPREDUCE-1598: -- This comment is a code review. *This is a minor nit* : {noformat} throw new IOException(Mkdirs failed to create + done.toString()); } -} + } else { // directory exists. Check permissions +checkDirectoryPermissions(doneDirFs, done, +mapreduce.jobtracker.jobhistory.completed.location); + } {noformat} The last {{checkDirectoryPermissions(...)}} call will cruddy up the indentation. The patch otherwise looks right. Wrongly configured 'hadoop.job.history.user.location' can cause jobs to be pinned in JobTracker's memory forever Key: MAPREDUCE-1598 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1598 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Amar Kamat Fix For: 0.20.3 Attachments: mapred-1598 Wrongly configured 'hadoop.job.history.user.location' can disable job-history. Jobs retires when JobHistory notifies the JobTracker after moving the history file to the done folder (i.e mapreduce.jobtracker.jobhistory.completed.location). If the JobHistory gets disabled, JobTracker would not receive any notification and thus jobs will be pinned in JobTracker's memory forever. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-166) Remove distcp from hadoop core libraries, and publish documentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran resolved MAPREDUCE-166. Resolution: Won't Fix Based on Owen's comment am closing this Remove distcp from hadoop core libraries, and publish documentation --- Key: MAPREDUCE-166 URL: https://issues.apache.org/jira/browse/MAPREDUCE-166 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Marco Nicosia Priority: Critical Every time we want to ship a change in distcp, not only do we have to replace the entire version of map-reduce deployed to the clusters, we also have to update internal documentation to reflect those changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2002) MRUnit driver classes should provide ability to set a configuration object to be passed into the mapper/reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897547#action_12897547 ] David Rosenstrauch commented on MAPREDUCE-2002: --- Yes, it does look like a dupe. (I did do a search through Jira before filing this bug, but that bug didn't turn up in my search for some reason.) MRUnit driver classes should provide ability to set a configuration object to be passed into the mapper/reducer --- Key: MAPREDUCE-2002 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2002 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mrunit Affects Versions: 0.20.2 Reporter: David Rosenstrauch Priority: Minor Short description: Enhance the org.apache.hadoop.mrunit.mapreduce.MapDriver, ReduceDriver, and MapReduceDriver unit test driver classes to contain setConfiguration and withConfiguration methods for passing in user-supplied org.apache.hadoop.conf.Configuration objects, and have those configuration objects eventually get passed on to the Context objects that are passed in to the mapper/reducer setup methods. (Rather than passing in an empty Configuration object, as is being done now.) Long description: The MRUnit driver classes (i.e., MapDriver, ReduceDriver, and MapReduceDriver) ought to be enhanced to contain methods for setting a Configuration object to be used by the mapper/reducer being tested - i.e., setConfiguration() and withConfiguration(). The only way to effectively pass parameters into a mapper or reducer is by setting properties on a configuration object, which the mapper/reducer can then retrieve in their setup step, and use to customize its operation. As a result, specific mappers/reducers may require the presence of specific configuration properties/parameters in order to function correctly (or at all). (I am currently coding such a reducer right now.) Testing such a mapper/reducer thus requires that the unit testing framework used provide the ability to pass in user-supplied Configuration objects to them so that they can be tested with appropriate parameter values. However, MRUnit currently does not provide this ability. (All mappers/reducers are always passed an empty configuration object.) And there is not even currently any (easy) way for the end-user to fix this problem by creating a simple sub-class that supplies this functionality, as such subclasses would require a substantial reimplementation/override of several MRUnit framework classes. I believe this something that is not too difficult to fix in the MRUnit framework code, however, and would greatly help the usability of MRUnit. Although I don't have time to code this enhancement right now, if needed/preferred I could squeeze out some time to code up a patch for this. If that's needed, please let me know. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-223) JobClient should work with -1/+1 version of JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897558#action_12897558 ] Alejandro Abdelnur commented on MAPREDUCE-223: -- Wasn't the idea that Avro would help fixing this? Yes, doing things over HTTP (assuming you take care of not breaking things a payload level) works. Still Hadoop does not support HTTP natively for client side calls, so this is not option without add-on protocol adapter systems fronting JT and NN/DNs. In other words, a JT proxy and a HDFS proxy. FYI, Oozie is planning to provide JT proxy capabilities. JobClient should work with -1/+1 version of JobTracker -- Key: MAPREDUCE-223 URL: https://issues.apache.org/jira/browse/MAPREDUCE-223 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: all Reporter: Alejandro Abdelnur Priority: Critical Currently there is version check on the RPC calls that enforces the same Hadoop version on the client and the server. To enable phased upgrades of systems using Hadoop and Hadoop itself the {{JobClient}} should be able to interact with a {{JobTracker}} of the previous and the next version of Hadoop (or with a range). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1980) TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897567#action_12897567 ] Amareshwari Sriramadasu commented on MAPREDUCE-1980: The same problem is present in TaskAttemptFinishedEvent also. setup and cleanup tasks are always logged as MAP_ATTEMPT_FINISHED. Can you fix that also? TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks -- Key: MAPREDUCE-1980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1980 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amar Kamat Assignee: Amar Kamat Attachments: mapreduce-1980-v1.0.patch TaskAttemptUnsuccessfulCompletionEvent is used to log unsuccessful map and reduce task attempts to JobHistory. Following is the implementation of getEventType() method of TaskAttemptUnsuccessfulCompletionEvent /** Get the event type */ public EventType getEventType() { return EventType.MAP_ATTEMPT_KILLED; } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1920: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.21.0 Resolution: Fixed I just committed this to trunk and branch 0.21. Thanks Tom! Job.getCounters() returns null when using a cluster --- Key: MAPREDUCE-1920 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Reporter: Aaron Kimball Assignee: Tom White Priority: Critical Fix For: 0.21.0 Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, MAPREDUCE-1920.patch Calling Job.getCounters() after the job has completed (successfully) returns null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1856: -- Status: Open (was: Patch Available) Extract a subset of tests for smoke (DOA) validation Key: MAPREDUCE-1856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch Similar to that of HDFS-1199 for MapReduce. Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce build i.e. find possible issues faster than the full test cycle does). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1856: -- Status: Patch Available (was: Open) has not been picked up in 6 days. Resubmitting Extract a subset of tests for smoke (DOA) validation Key: MAPREDUCE-1856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch Similar to that of HDFS-1199 for MapReduce. Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce build i.e. find possible issues faster than the full test cycle does). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.