[jira] Commented: (MAPREDUCE-1516) JobTracker should issue a delegation token only for kerberos authenticated client
[ https://issues.apache.org/jira/browse/MAPREDUCE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857226#action_12857226 ] Jitendra Nath Pandey commented on MAPREDUCE-1516: - I ran ant tests. All tests passed except TestMiniMRChildTask, which also fails without this patch. JobTracker should issue a delegation token only for kerberos authenticated client - Key: MAPREDUCE-1516 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1516 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: MR-1516.1.patch, MR-1516.2.patch, MR-1516.3.patch, MR-1516.4.patch, MR-1516.5.patch, MR-1516.6.patch Delegation tokens should be issued only if the client is kerberos authenticated. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1690) Using BuddySystem to reduce the ReduceTask's mem usage in the step of shuffle
[ https://issues.apache.org/jira/browse/MAPREDUCE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] luoli updated MAPREDUCE-1690: - Attachment: allo_use_buddy_gc.JPG Using BuddySystem to reduce the ReduceTask's mem usage in the step of shuffle - Key: MAPREDUCE-1690 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1690 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task, tasktracker Affects Versions: 0.20.2, 0.20.3 Reporter: luoli Fix For: 0.20.2 Attachments: allo_use_buddy.JPG, allo_use_buddy_gc.JPG, allo_use_new.JPG, allo_use_new_gc.JPG, mapreduce-1690.v1.patch, mapreduce-1690.v1.patch, mapreduce-1690.v1.patch, mapreduce-1690.v2.patch When the reduce task launched, it will start several MapOutputCopier threads to download the output from finished map, every thread is a MapOutputCopier thread running instance. Every time the thread trying to copy map output from remote from local, the MapOutputCopier thread will desides to shuffle the map output data in memory or to disk, this depends on the map output data size and the configuration of the ShuffleRamManager which loaded from the client hadoop-site.xml or JobConf, no matter what, if the reduce task decides to shuffle the map output data in memory , the MapOutputCopier will connect to the remote map host , read the map output in the socket, and then copy map-output into an in-memory buffer, and every time, the in-memory buffer is from byte[] shuffleData = new byte[mapOutputLength];, here is where the problem begin. In our cluster, there are some special jobs which will process a huge number of original data, say 110TB, so the reduce tasks will shuffle a lot of data, some shuffled to disk and some shuffle in memory, even though, their will be a lot of data shuffled in memory, and every time the MapOutputCopier threads will new some memory from the reduce heap, for a long-running-huge-data job, this will easily feed the Reduce Task's heap size to the full, make the reduce task to OOM and then exhausted the memory of the TaskTracker machine. Here is our solution: Change the code logic when MapOutputCopier threads shuffle map-output in memory, using a BuddySystem similar to the Linux Kernel BuddySystem which used to allocate and deallocate memory page. When the reduce task launched , initialize some memory to this BuddySystem, say 128MB, everytime the reduce want to shuffle map-output in memory ,just require memory buffer from the buddySystem, if the buddySystem has enough memory , use it, and if not , let the MapOutputCopier threads to wait() just like what they do right now in the current hadoop shuffle code logic. This will reduce the Reduce Task's memory usage and reduce the TaskTracker memory shortage a lot. In our cluster, this buddySystem makes the situation of lost a batch of tasktrackers because of memory over used when the huge jobs runningdisappeared. And therefore makes the cluster more stable. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1690) Using BuddySystem to reduce the ReduceTask's mem usage in the step of shuffle
[ https://issues.apache.org/jira/browse/MAPREDUCE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] luoli updated MAPREDUCE-1690: - Attachment: allo_use_buddy.JPG allo_use_new.JPG allo_use_new_gc.JPG Here I upload The performance and memory allocate using buddy and just using new from heap image. Using BuddySystem to reduce the ReduceTask's mem usage in the step of shuffle - Key: MAPREDUCE-1690 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1690 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task, tasktracker Affects Versions: 0.20.2, 0.20.3 Reporter: luoli Fix For: 0.20.2 Attachments: allo_use_buddy.JPG, allo_use_buddy_gc.JPG, allo_use_new.JPG, allo_use_new_gc.JPG, mapreduce-1690.v1.patch, mapreduce-1690.v1.patch, mapreduce-1690.v1.patch, mapreduce-1690.v2.patch When the reduce task launched, it will start several MapOutputCopier threads to download the output from finished map, every thread is a MapOutputCopier thread running instance. Every time the thread trying to copy map output from remote from local, the MapOutputCopier thread will desides to shuffle the map output data in memory or to disk, this depends on the map output data size and the configuration of the ShuffleRamManager which loaded from the client hadoop-site.xml or JobConf, no matter what, if the reduce task decides to shuffle the map output data in memory , the MapOutputCopier will connect to the remote map host , read the map output in the socket, and then copy map-output into an in-memory buffer, and every time, the in-memory buffer is from byte[] shuffleData = new byte[mapOutputLength];, here is where the problem begin. In our cluster, there are some special jobs which will process a huge number of original data, say 110TB, so the reduce tasks will shuffle a lot of data, some shuffled to disk and some shuffle in memory, even though, their will be a lot of data shuffled in memory, and every time the MapOutputCopier threads will new some memory from the reduce heap, for a long-running-huge-data job, this will easily feed the Reduce Task's heap size to the full, make the reduce task to OOM and then exhausted the memory of the TaskTracker machine. Here is our solution: Change the code logic when MapOutputCopier threads shuffle map-output in memory, using a BuddySystem similar to the Linux Kernel BuddySystem which used to allocate and deallocate memory page. When the reduce task launched , initialize some memory to this BuddySystem, say 128MB, everytime the reduce want to shuffle map-output in memory ,just require memory buffer from the buddySystem, if the buddySystem has enough memory , use it, and if not , let the MapOutputCopier threads to wait() just like what they do right now in the current hadoop shuffle code logic. This will reduce the Reduce Task's memory usage and reduce the TaskTracker memory shortage a lot. In our cluster, this buddySystem makes the situation of lost a batch of tasktrackers because of memory over used when the huge jobs runningdisappeared. And therefore makes the cluster more stable. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-1704) Parity files that are outdated or nonexistent should be immediately disregarded
Parity files that are outdated or nonexistent should be immediately disregarded --- Key: MAPREDUCE-1704 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1704 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 In the current implementation, old or nonexistent parity files are not immediately disregarded. Absence will trigger exceptions, but old files could lead to bad recoveries and maybe data corruption. This should be fixed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-1705) Archiving and Purging of parity files should handle globbed policies
Archiving and Purging of parity files should handle globbed policies Key: MAPREDUCE-1705 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1705 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Fix For: 0.22.0 Archiving (har) and purging of parity files don't work in policies whose source is a globbed path. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (MAPREDUCE-1705) Archiving and Purging of parity files should handle globbed policies
[ https://issues.apache.org/jira/browse/MAPREDUCE-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt reassigned MAPREDUCE-1705: -- Assignee: Rodrigo Schmidt Archiving and Purging of parity files should handle globbed policies Key: MAPREDUCE-1705 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1705 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 Archiving (har) and purging of parity files don't work in policies whose source is a globbed path. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-1706) Log RAID recoveries on HDFS
Log RAID recoveries on HDFS --- Key: MAPREDUCE-1706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1706 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt It would be good to have a way to centralize all the recovery logs, since recovery can be executed by any hdfs client. The best place to store this information is HDFS itself. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-1707) TaskRunner can get NPE in getting ugi from TaskTracker
TaskRunner can get NPE in getting ugi from TaskTracker -- Key: MAPREDUCE-1707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1707 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Amareshwari Sriramadasu Fix For: 0.22.0 The following code in TaskRunner can get NPE in the scenario described below. {code} UserGroupInformation ugi = tracker.getRunningJob(t.getJobID()).getUGI(); {code} The scenario: Tracker got a LaunchTaskAction; Task is localized and TaskRunner is started. Then Tracker got a KillJobAction; This would issue a kill for the task. But, kill will be a no-op because the task did not actually start; The job is removed from runningJobs. Then if TaskRunner calls tracker.getRunningJob(t.getJobID()), it will be null. Instead of TaskRunner doing a back call to tasktracker to get the ugi, tracker.getRunningJob(t.getJobID()).getUGI(), ugi should be passed a parameter in the constructor of TaskRunner. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-1708) Add a test for connect and read time outs during shuffle.
Add a test for connect and read time outs during shuffle. - Key: MAPREDUCE-1708 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1708 Project: Hadoop Map/Reduce Issue Type: Bug Components: task, test Reporter: Amareshwari Sriramadasu Write a test which injects connect and read time outs during shuffle and validates the fetch failures for corresponding maps. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1276) Shuffle connection logic needs correction
[ https://issues.apache.org/jira/browse/MAPREDUCE-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857263#action_12857263 ] Amareshwari Sriramadasu commented on MAPREDUCE-1276: Test output from hudson's console : [exec] [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12441703/patch-1276.txt [exec] against trunk revision 933441. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] -1 contrib tests. The patch failed contrib unit tests. [exec] [exec] Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/108/testReport/ [exec] Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/108/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/108/artifact/trunk/build/test/checkstyle-errors.html [exec] Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/108/console bq. -1 contrib tests Contrib test failures are because NoClassDefFoundError (MAPREDUCE-1275). bq. -1 tests included. It is difficult to write a Junit test for simulating Read or Connect Timeout in shuffle. I created MAPREDUCE-1708 to add a fault injection test. I manually tested the patch by adding sleep in TaskTracker.MapOutputServlet.sendMapFile for one of the attempts. Verified the attempt gets failed because of Too many fetch failures; re-executes on some other machine; and the job succeeds. Shuffle connection logic needs correction -- Key: MAPREDUCE-1276 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1276 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.21.0 Reporter: Jothi Padmanabhan Assignee: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.21.0 Attachments: patch-1276.txt While looking at the code with Amareshwari, we realized that {{Fetcher#copyFromHost}} marks connection as successful when {{url.openConnection}} returns. This is wrong. Connection is done inside implicitly inside {{getInputStream}}; we need to split {{getInputStream}} into {{connect}} and {{getInputStream}} to handle the connection and read time out strategies correctly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1673) Start and Stop scripts for the RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt updated MAPREDUCE-1673: --- Status: Open (was: Patch Available) Start and Stop scripts for the RaidNode --- Key: MAPREDUCE-1673 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1673 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 Attachments: MAPREDUCE-1673.1.patch, MAPREDUCE-1673.2.patch, MAPREDUCE-1673.patch We should have scripts that start and stop the RaidNode automatically. Something like start-raidnode.sh and stop-raidnode.sh -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1673) Start and Stop scripts for the RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt updated MAPREDUCE-1673: --- Status: Patch Available (was: Open) Start and Stop scripts for the RaidNode --- Key: MAPREDUCE-1673 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1673 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 Attachments: MAPREDUCE-1673.1.patch, MAPREDUCE-1673.2.patch, MAPREDUCE-1673.patch We should have scripts that start and stop the RaidNode automatically. Something like start-raidnode.sh and stop-raidnode.sh -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1695) capacity scheduler is not included in findbugs/javadoc targets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857280#action_12857280 ] Hemanth Yamijala commented on MAPREDUCE-1695: - Is it fair to assume that most projects would want to be included in findbugs target and javadoc-dev target by default ? Is there some way by which we can make this the default behavior then ? capacity scheduler is not included in findbugs/javadoc targets -- Key: MAPREDUCE-1695 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1695 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Reporter: Hong Tang Assignee: Hong Tang Attachments: MAPREDUCE-1695-2.patch, MAPREDUCE-1695-3.patch, MAPREDUCE-1695.patch, mr1695-hadoop-findbugs-report-1.html, mr1695-hadoop-findbugs-report-2.html Capacity Scheduler is not included in findbugs/javadoc targets. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1695) capacity scheduler is not included in findbugs/javadoc targets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857281#action_12857281 ] Hemanth Yamijala commented on MAPREDUCE-1695: - bq. Is there some way by which we can make this the default behavior then ? To clarify, I mean make it the default behavior without the project needing to do anything. capacity scheduler is not included in findbugs/javadoc targets -- Key: MAPREDUCE-1695 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1695 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Reporter: Hong Tang Assignee: Hong Tang Attachments: MAPREDUCE-1695-2.patch, MAPREDUCE-1695-3.patch, MAPREDUCE-1695.patch, mr1695-hadoop-findbugs-report-1.html, mr1695-hadoop-findbugs-report-2.html Capacity Scheduler is not included in findbugs/javadoc targets. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1434) Dynamic add input for one job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Shi updated MAPREDUCE-1434: Fix Version/s: 0.20.3 Affects Version/s: 0.20.3 Environment: (was: 0.19.0) Dynamic add input for one job - Key: MAPREDUCE-1434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.3 Reporter: Xing Shi Fix For: 0.20.3 Always we should firstly upload the data to hdfs, then we can analize the data using hadoop mapreduce. Sometimes, the upload process takes long time. So if we can add input during one job, the time can be saved. WHAT? Client: a) hadoop job -add-input jobId inputFormat ... Add the input to jobid b) hadoop job -add-input done Tell the JobTracker, the input has been prepared over. c) hadoop job -add-input status jobid Show how many input the jobid has. HOWTO? Mainly, I think we should do three things: 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate the split, and submit to JobTracker. 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks. Because the uploaded data will be processed quickly, so it also should update the scheduler to support pending a map task till Client tells the Job input done. 3. Reducer: the reducer should also update the mapNums, so it will shuffle right. This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1673) Start and Stop scripts for the RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857305#action_12857305 ] Hadoop QA commented on MAPREDUCE-1673: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441815/MAPREDUCE-1673.2.patch against trunk revision 933441. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/112/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/112/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/112/console This message is automatically generated. Start and Stop scripts for the RaidNode --- Key: MAPREDUCE-1673 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1673 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/raid Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 Attachments: MAPREDUCE-1673.1.patch, MAPREDUCE-1673.2.patch, MAPREDUCE-1673.patch We should have scripts that start and stop the RaidNode automatically. Something like start-raidnode.sh and stop-raidnode.sh -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1221: - Status: Open (was: Patch Available) I just started looking at this, some minor comments: We shouldn't be using the TT.fConf to read config values each time, please save it and re-use: {noformat} Index: src/java/org/apache/hadoop/mapred/TaskTracker.java === --- src/java/org/apache/hadoop/mapred/TaskTracker.java (revision 921667) +++ src/java/org/apache/hadoop/mapred/TaskTracker.java (working copy) +if (fConf.get(TTConfig.TT_RESERVED_PHYSCIALMEMORY_MB) == null + totalMemoryAllottedForTasks == JobConf.DISABLED_MEMORY_LIMIT) { Index: src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java === --- src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java (revision 921667) +++ src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java (working copy) +long reservedRssMemory = taskTracker.getJobConf(). +getLong(TTConfig.TT_RESERVED_PHYSCIALMEMORY_MB, +JobConf.DISABLED_MEMORY_LIMIT); {noformat} Kill tasks on a node if the free physical memory on that machine falls below a configured threshold --- Key: MAPREDUCE-1221 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.22.0 Reporter: dhruba borthakur Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold. On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1695) capacity scheduler is not included in findbugs/javadoc targets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857470#action_12857470 ] Hong Tang commented on MAPREDUCE-1695: -- bq. Is it fair to assume that most projects would want to be included in findbugs target and javadoc-dev target by default ? I'd say YES to this question. But I suggest we separate this to a different jira because it would affect all contrib projects (and possibly an inevitable discussion on contrib projects vs subprojects), and the solution may need some more deliberation. capacity scheduler is not included in findbugs/javadoc targets -- Key: MAPREDUCE-1695 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1695 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Reporter: Hong Tang Assignee: Hong Tang Attachments: MAPREDUCE-1695-2.patch, MAPREDUCE-1695-3.patch, MAPREDUCE-1695.patch, mr1695-hadoop-findbugs-report-1.html, mr1695-hadoop-findbugs-report-2.html Capacity Scheduler is not included in findbugs/javadoc targets. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1695) capacity scheduler is not included in findbugs/javadoc targets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857520#action_12857520 ] Hadoop QA commented on MAPREDUCE-1695: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441668/MAPREDUCE-1695-3.patch against trunk revision 933441. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/358/console This message is automatically generated. capacity scheduler is not included in findbugs/javadoc targets -- Key: MAPREDUCE-1695 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1695 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Reporter: Hong Tang Assignee: Hong Tang Attachments: MAPREDUCE-1695-2.patch, MAPREDUCE-1695-3.patch, MAPREDUCE-1695.patch, mr1695-hadoop-findbugs-report-1.html, mr1695-hadoop-findbugs-report-2.html Capacity Scheduler is not included in findbugs/javadoc targets. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857532#action_12857532 ] Hadoop QA commented on MAPREDUCE-1221: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441860/MAPREDUCE-1221-v5.txt against trunk revision 933441. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/console This message is automatically generated. Kill tasks on a node if the free physical memory on that machine falls below a configured threshold --- Key: MAPREDUCE-1221 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.22.0 Reporter: dhruba borthakur Assignee: Scott Chen Fix For: 0.21.0 Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold. On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1538: -- Status: Open (was: Patch Available) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit - Key: MAPREDUCE-1538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.21.0 Attachments: MAPREDUCE-1538-v2.txt, MAPREDUCE-1538.patch TrackerDistributedCacheManager deletes the cached files when the size goes up to a configured number. But there is no such limit for the number of subdirectories. Therefore the number of subdirectories may grow large and exceed system limit. This will make TT cannot create directory when getLocalCache and fails the tasks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1538: -- Status: Patch Available (was: Open) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit - Key: MAPREDUCE-1538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.21.0 Attachments: MAPREDUCE-1538-v2.txt, MAPREDUCE-1538.patch TrackerDistributedCacheManager deletes the cached files when the size goes up to a configured number. But there is no such limit for the number of subdirectories. Therefore the number of subdirectories may grow large and exceed system limit. This will make TT cannot create directory when getLocalCache and fails the tasks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1545) Add 'first-task-launched' to job-summary
[ https://issues.apache.org/jira/browse/MAPREDUCE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857577#action_12857577 ] Hudson commented on MAPREDUCE-1545: --- Integrated in Hadoop-Common-trunk-Commit #221 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/221/]) HADOOP-6657. Add a capitalization method to StringUtils for MAPREDUCE-1545 Add 'first-task-launched' to job-summary Key: MAPREDUCE-1545 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1545 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Arun C Murthy Assignee: Luke Lu Fix For: 0.22.0 Attachments: mr-1545-trunk-v1.patch, mr-1545-trunk-v2.patch, mr-1545-y20s-v1.patch, mr-1545-y20s-v2.patch, mr-1545-y20s-v3.patch It would be useful to track 'first-task-launched' time to job-summary for better reporting. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1614) TestDFSIO should allow to configure output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857582#action_12857582 ] Konstantin Boudnik commented on MAPREDUCE-1614: --- It turns out the the problem is pretty typical for the programs in the hadoop-test.jar. E.g. NNBench does't allow to configure a location of the results log thus dictating the working directory from where this test can be ran. TestDFSIO should allow to configure output directory Key: MAPREDUCE-1614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1614 Project: Hadoop Map/Reduce Issue Type: Bug Components: benchmarks Affects Versions: 0.20.2 Reporter: Konstantin Boudnik TestDFSIO has a hardcoded location for its files to be written and read to or from. This location is /benchmarks. However, it might pose a problem if HDFS '/' doesn't allow anyone to write into it. It'd be convenient to have a command line option to specify an alternative location on demand. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857587#action_12857587 ] Hadoop QA commented on MAPREDUCE-1538: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441877/MAPREDUCE-1538-v2.txt against trunk revision 933441. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/114/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/114/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/114/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/114/console This message is automatically generated. TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit - Key: MAPREDUCE-1538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.21.0 Attachments: MAPREDUCE-1538-v2.txt, MAPREDUCE-1538.patch TrackerDistributedCacheManager deletes the cached files when the size goes up to a configured number. But there is no such limit for the number of subdirectories. Therefore the number of subdirectories may grow large and exceed system limit. This will make TT cannot create directory when getLocalCache and fails the tasks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857604#action_12857604 ] Scott Chen commented on MAPREDUCE-1538: --- Got 105 test failures with the message {code:xml} Error Message Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stacktrace junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. {code} TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit - Key: MAPREDUCE-1538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.21.0 Attachments: MAPREDUCE-1538-v2.txt, MAPREDUCE-1538.patch TrackerDistributedCacheManager deletes the cached files when the size goes up to a configured number. But there is no such limit for the number of subdirectories. Therefore the number of subdirectories may grow large and exceed system limit. This will make TT cannot create directory when getLocalCache and fails the tasks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1516) JobTracker should issue a delegation token only for kerberos authenticated client
[ https://issues.apache.org/jira/browse/MAPREDUCE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857625#action_12857625 ] Devaraj Das commented on MAPREDUCE-1516: Sigh .. I wish we hadn't duplicated the methods isAllowedDelegationTokenOp and getConnectionAuthenticationMethod in MR and HDFS, and instead defined something that would address what the methods provide in Common. Could you please take care of this.. JobTracker should issue a delegation token only for kerberos authenticated client - Key: MAPREDUCE-1516 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1516 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: MR-1516.1.patch, MR-1516.2.patch, MR-1516.3.patch, MR-1516.4.patch, MR-1516.5.patch, MR-1516.6.patch Delegation tokens should be issued only if the client is kerberos authenticated. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1317) Reducing memory consumption of rumen objects
[ https://issues.apache.org/jira/browse/MAPREDUCE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857640#action_12857640 ] Hong Tang commented on MAPREDUCE-1317: -- Patch mapreduce-1317-20091223.patch applies cleanly to yahoop-hadoop-0.20.1xx branch. Reducing memory consumption of rumen objects Key: MAPREDUCE-1317 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1317 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0, 0.22.0 Reporter: Hong Tang Assignee: Hong Tang Fix For: 0.21.0 Attachments: mapreduce-1317-20091218.patch, mapreduce-1317-20091222-2.patch, mapreduce-1317-20091222.patch, mapreduce-1317-20091223.patch We have encountered OutOfMemoryErrors in mumak and gridmix when dealing with very large jobs. The purpose of this jira is to optimze memory consumption of rumen produced job objects. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1434) Dynamic add input for one job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xing Shi updated MAPREDUCE-1434: Attachment: dynamic_input-v1.patch We can dynamic add input to a job by use cmd: {noformat} hadoop job -D mapred.input.format.class=YourInputFormatClass -input-add jobid inputdir {noformat} and tell the master(jobtracker) that the input has been added done by: {noformat} hadoop job -input-done jobid {noformat} Dynamic add input for one job - Key: MAPREDUCE-1434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.3 Reporter: Xing Shi Fix For: 0.20.3 Attachments: dynamic_input-v1.patch Always we should firstly upload the data to hdfs, then we can analize the data using hadoop mapreduce. Sometimes, the upload process takes long time. So if we can add input during one job, the time can be saved. WHAT? Client: a) hadoop job -add-input jobId inputFormat ... Add the input to jobid b) hadoop job -add-input done Tell the JobTracker, the input has been prepared over. c) hadoop job -add-input status jobid Show how many input the jobid has. HOWTO? Mainly, I think we should do three things: 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate the split, and submit to JobTracker. 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks. Because the uploaded data will be processed quickly, so it also should update the scheduler to support pending a map task till Client tells the Job input done. 3. Reducer: the reducer should also update the mapNums, so it will shuffle right. This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1532) Delegation token is obtained as the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1532: --- Attachment: 1532.1.patch The attached patch is for trunk Delegation token is obtained as the superuser - Key: MAPREDUCE-1532 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1532 Project: Hadoop Map/Reduce Issue Type: Bug Components: job submission, security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 1532-bp20.1.patch, 1532-bp20.2.patch, 1532-bp20.4.1.patch, 1532-bp20.4.2.patch, 1532-bp20.4.patch, 1532.1.patch When the UserGroupInformation.doAs is invoked for proxy users, the delegation token is incorrectly obtained as the real user. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1692) Remove TestStreamedMerge from the streaming tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1692: --- Status: Patch Available (was: Open) Fix Version/s: 0.22.0 Remove TestStreamedMerge from the streaming tests - Key: MAPREDUCE-1692 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1692 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Sreekanth Ramakrishnan Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1692-1.patch, MAPREDUCE-1692-1.patch, patch-1692.txt Currently the {{TestStreamedMerge}} is never run as a part of the streaming test suite, the code paths which were exercised by the test was removed in HADOOP-1315, so it is better to remove the testcase from the code base. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1692) Remove TestStreamedMerge from the streaming tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1692: --- Attachment: patch-1692.txt Patch incorporates Hemanth's suggestions. Remove TestStreamedMerge from the streaming tests - Key: MAPREDUCE-1692 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1692 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Sreekanth Ramakrishnan Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1692-1.patch, MAPREDUCE-1692-1.patch, patch-1692.txt Currently the {{TestStreamedMerge}} is never run as a part of the streaming test suite, the code paths which were exercised by the test was removed in HADOOP-1315, so it is better to remove the testcase from the code base. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira