[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773764#action_12773764 ] Matei Zaharia commented on MAPREDUCE-707: - Thanks Alan, this looks good! +1 from me. I'll wait for the Hudson automated test and then commit it if there are no warnings. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, > MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-707: Status: Open (was: Patch Available) > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, > MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated MAPREDUCE-707: Status: Patch Available (was: Open) > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, > MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
[ https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773759#action_12773759 ] Hong Tang commented on MAPREDUCE-1183: -- Why do we need to serialize mappers and reducers? > Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al > - > > Key: MAPREDUCE-1183 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 0.21.0 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Currently the Map-Reduce framework uses Configuration to pass information > about the various aspects of a job such as Mapper, Reducer, InputFormat, > OutputFormat, OutputCommitter etc. and application developers use > org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission > time: > {noformat} > Job.setMapperClass(IdentityMapper.class); > Job.setReducerClass(IdentityReducer.class); > Job.setInputFormatClass(TextInputFormat.class); > Job.setOutputFormatClass(TextOutputFormat.class); > ... > {noformat} > The proposal is that we move to a model where end-users interact with > org.apache.hadoop.mapreduce.Job via actual objects which are then serialized > by the framework: > {noformat} > Job.setMapper(new IdentityMapper()); > Job.setReducer(new IdentityReducer()); > Job.setInputFormat(new TextInputFormat("in")); > Job.setOutputFormat(new TextOutputFormat("out")); > ... > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
[ https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773756#action_12773756 ] Arun C Murthy commented on MAPREDUCE-1183: -- The current Configuration-based system has issues in a couple of use-cases: # The primary drawback: Difficulty in implementing a Composite{Input|Output}Format Pig is in the middle of a re-write of their Load/Store interfaces (http://wiki.apache.org/pig/LoadStoreRedesignProposal) where they want to be able to take an arbitrary InputFormat or OutputFormat and wrap it for use within Pig. Similarly a 'CompositeInputFormat' which can work with multiple InputFormats (say a map-side merge between data in multiple SequenceFiles and TFiles) leads to a situation where we push the {Input|Output}Format to deal with multiple copies of Configuration and manage them. This necessary because using a single Configuration results in same configuration key being over-written by multiple instances of {Input|Output}Format (say mapred.input.dir over-written by SequenceFileInputFormat and TFileInputFormat). # Annoyance: An application which needs a very small amount of state in the Mapper/Reducer (say a small map of metadata) is forced to use DistributedCache, it's much more natural to have that state stored in the Mapper/Reducer and have it serialized from the client to the compute nodes. Thus the proposal is to move to a model where an actual Mapper/Reducer/InputFormat/OutputFormat object is serialized by the framework, thus eliminating the need for using Configuration for storing the requisite information and using the object to keep the necessary state e.g. FileInputFormat will have a member to keep a list of input-paths to be processed. The new api would look like: {noformat} Job job = new Job(); job.setMapper(new WordCountMapper()); job.setReducer(new WordCountReducer()); InputFormat in = new TextInputFormat("in"); in.addInputPath("in2"); OutputFormat out = new TextOutputFormat("out"); job.setInputFormat(in); job.setOutputFormat(out); job.waitForCompletion(); {noformat} Thoughts? > Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al > - > > Key: MAPREDUCE-1183 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 0.21.0 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Currently the Map-Reduce framework uses Configuration to pass information > about the various aspects of a job such as Mapper, Reducer, InputFormat, > OutputFormat, OutputCommitter etc. and application developers use > org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission > time: > {noformat} > Job.setMapperClass(IdentityMapper.class); > Job.setReducerClass(IdentityReducer.class); > Job.setInputFormatClass(TextInputFormat.class); > Job.setOutputFormatClass(TextOutputFormat.class); > ... > {noformat} > The proposal is that we move to a model where end-users interact with > org.apache.hadoop.mapreduce.Job via actual objects which are then serialized > by the framework: > {noformat} > Job.setMapper(new IdentityMapper()); > Job.setReducer(new IdentityReducer()); > Job.setInputFormat(new TextInputFormat("in")); > Job.setOutputFormat(new TextOutputFormat("out")); > ... > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: MAPREDUCE-707-2-apache.patch corrected patch > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, > MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: (was: MAPREDUCE-707-2-apache.patch) > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773751#action_12773751 ] Alan Heirich commented on MAPREDUCE-707: Oops - I thought TestFairScheduler would be run as part of "ant test". I guess not. It passes now. Hudson reported that TestGridmixSubmission failed, but that passes in my workspace. I'm on Mac OS X and I saw some tests fail from a fresh build that should have passed. Please see MAPREDUCE-707-2-apache.patch > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, > MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: MAPREDUCE-707-2-apache.patch further revisions per comments. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, > MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773746#action_12773746 ] Hadoop QA commented on MAPREDUCE-1182: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424074/M1182-0.patch against trunk revision 832362. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/console This message is automatically generated. > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani >Priority: Blocker > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al - Key: MAPREDUCE-1183 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.21.0 Reporter: Arun C Murthy Assignee: Arun C Murthy Currently the Map-Reduce framework uses Configuration to pass information about the various aspects of a job such as Mapper, Reducer, InputFormat, OutputFormat, OutputCommitter etc. and application developers use org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission time: {noformat} Job.setMapperClass(IdentityMapper.class); Job.setReducerClass(IdentityReducer.class); Job.setInputFormatClass(TextInputFormat.class); Job.setOutputFormatClass(TextOutputFormat.class); ... {noformat} The proposal is that we move to a model where end-users interact with org.apache.hadoop.mapreduce.Job via actual objects which are then serialized by the framework: {noformat} Job.setMapper(new IdentityMapper()); Job.setReducer(new IdentityReducer()); Job.setInputFormat(new TextInputFormat("in")); Job.setOutputFormat(new TextOutputFormat("out")); ... {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated MAPREDUCE-1026: -- Attachment: MAPREDUCE-1026-1.patch bq. 1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens done bq. 2) The call to localizeJobTokenFile should be done before the call to taskController.initializeJob(context) in the TaskTracker.localizeJob method. Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles bq. 3) Minor: for the request/response HTTP headers, make the first character upper case done bq. 4) HMacUtil could override the equals method and put in logic for comapring two HMacUtil objects, instead of defining verifyHash. We are note really comparing HMacUtil objects, they are just utilities. So I think verifyHash() should be more logical. bq. 5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be Writable (as opposed to having to define load/store methods) Comp is used in the TreeMap constructor as the comparator. Also added synchronization around the map of StoreKeys updates in TaskTracker. > Shuffle should be secure > > > Key: MAPREDUCE-1026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security >Reporter: Owen O'Malley >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026.patch, > MAPREDUCE-1026.patch > > > Since the user's data is available via http from the TaskTrackers, we should > require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773734#action_12773734 ] Matei Zaharia commented on MAPREDUCE-707: - By the way, here's a tip if you want to run the unit tests faster - you can run just the fair scheduler's unit test with ant -Dtestcase=TestFairScheduler test. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773733#action_12773733 ] Matei Zaharia commented on MAPREDUCE-707: - This looks pretty good, except that testPoolAssignment fails when I run the unit tests. I think the problem is with job4, where you set "mapred.fairscheduler.poolnameproperty" in the job's Configuration (jobConf2), not in the fair scheduler's configuration. You need to set the poolNameProperty when you create the fair scheduler object. That's what the code used to do with the POOL_PROPERTY string at the top, but you can't set the pool name property to mapred.fairscheduler.pool, because that wouldn't be testing anything. I'd suggest leaving the POOL_PROPERTY as "pool" and trying to set job4's pool through that. Also, for sanity, in job1 (where you set mapred.fairscheduler.pool directly), you should say the "pool" property to something other than poolA to make sure it isn't used. Finally, two small nitpicks: # In the test line with assertEquals(scheduler.getPoolManager().getPoolName(job2), "poolA"), you should switch the two parameters (put "poolA" first); the first parameter is always supposed to be the value expected. # Regarding the comment on getPoolName - the pool name property used by default is "user.name", not "project". I think I forgot to fix that comment a while back. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1159) Limit Job name on jobtracker.jsp to be 80 char long
[ https://issues.apache.org/jira/browse/MAPREDUCE-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1159: - Status: Open (was: Patch Available) Canceling patch while Nicholas's comments are addressed > Limit Job name on jobtracker.jsp to be 80 char long > --- > > Key: MAPREDUCE-1159 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1159 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao >Priority: Trivial > Attachments: MAPREDUCE-1159.trunk.patch > > > Sometimes a user submits a job with a very long job name. That made > jobtracker.jsp very hard to read. > We should limit the size of the job name. User can see the full name when > they click on the job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1182: - Priority: Blocker (was: Major) > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani >Priority: Blocker > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1127) distcp should timeout later during S3-based transfers
[ https://issues.apache.org/jira/browse/MAPREDUCE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1127: - Status: Open (was: Patch Available) On reflection... the documentation change is in 0.21, so it may not be worth adding and removing this special case in 0.22. Would you object to calling the documentation change "sufficient" and exploring real fixes for S3 in 0.22? Depending on how that is resolved, this workaround may become necessary again. > distcp should timeout later during S3-based transfers > - > > Key: MAPREDUCE-1127 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1127 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1127.2.patch, MAPREDUCE-1127.patch > > > Per MAPREDUCE-972, rename and other operations on distcp can take longer than > the typical mapreduce task timeout. As an interim fix, this timeout should be > increased when the distcp destination is S3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773728#action_12773728 ] Hadoop QA commented on MAPREDUCE-707: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424051/MAPREDUCE-707-apache.patch against trunk revision 832362. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/console This message is automatically generated. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773709#action_12773709 ] Devaraj Das commented on MAPREDUCE-1026: My worry on the reduce task killing itself can be ignored. That is the right thing to happen as Boris and I discussed offline.. > Shuffle should be secure > > > Key: MAPREDUCE-1026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security >Reporter: Owen O'Malley >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch > > > Since the user's data is available via http from the TaskTrackers, we should > require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1056) [Mumak] Add forrest documentation for mumak
[ https://issues.apache.org/jira/browse/MAPREDUCE-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1056: - Priority: Blocker (was: Major) > [Mumak] Add forrest documentation for mumak > --- > > Key: MAPREDUCE-1056 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1056 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Priority: Blocker > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1137) Mumak should have a unit test to ensure jetty UI is running properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1137: - Priority: Blocker (was: Major) > Mumak should have a unit test to ensure jetty UI is running properly > > > Key: MAPREDUCE-1137 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1137 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Hong Tang >Priority: Blocker > > Mumak should have a unit test that ensures jetty UI is running properly. This > will help detecting issues like MAPREDUCE-1104 sooner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1173) Documenting MapReduce metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1173: - Priority: Blocker (was: Major) > Documenting MapReduce metrics > - > > Key: MAPREDUCE-1173 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1173 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Hong Tang >Priority: Blocker > > As part of HADOOP-6350, we should document the metrics for JobTracker and > TaskTracker as part of their interfaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1175) We should have a spec on JobHistory file format
[ https://issues.apache.org/jira/browse/MAPREDUCE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1175: - Priority: Blocker (was: Major) > We should have a spec on JobHistory file format > --- > > Key: MAPREDUCE-1175 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1175 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Hong Tang >Priority: Blocker > > Currently, JobHistory schema is specified in o.a.h.m.jobhistory.Event.avpr, > it requires some guess work for me to understand the meaning of various > records. Also, it would be nice to spec out the dependency among the events. > This would make tools like rumen more dependable to parse job history logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773701#action_12773701 ] Devaraj Das commented on MAPREDUCE-1026: Looked at the patch some more. Few more comments: 1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens 2) The call to localizeJobTokenFile should be done before the call to taskController.initializeJob(context) in the TaskTracker.localizeJob method. Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles 3) Minor: for the request/response HTTP headers, make the first character upper case 4) HMacUtil could override the equals method and put in logic for comapring two HMacUtil objects, instead of defining verifyHash. 5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be Writable (as opposed to having to define load/store methods) For the case where a reduce task fails due to the TaskTracker(s) not being authentic, we probably need care. Two things might happen - the JobTracker might get enough notifications from other reduces in the system, and it might just decide to re-execute the map. The other situation is what is bothering me - the reduce task would kill itself after a certain threshold number of trials. This would be bad. IIRC it is not predictable which one could happen first. > Shuffle should be secure > > > Key: MAPREDUCE-1026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security >Reporter: Owen O'Malley >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch > > > Since the user's data is available via http from the TaskTrackers, we should > require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
[ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773700#action_12773700 ] Hong Tang commented on MAPREDUCE-901: - Like metrics, we should also clearly document the framework counters. > Move Framework Counters into a TaskMetric structure > --- > > Key: MAPREDUCE-901 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-901 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 0.21.0 >Reporter: Owen O'Malley >Assignee: Arun C Murthy > Fix For: 0.21.0 > > Attachments: 901_1.patch, 901_1.patch, MAPREDUCE-901.patch, > MAPREDUCE-901.patch > > > I think we should move all of the Counters that the framework updates into a > single class called TaskMetrics. TaskMetrics would have specific fields for > each of the metrics like input records, input bytes, output records, etc. > It would both reduce the serialized size of the heartbeats (by shrinking the > Counters down to just the user's counters) and decrease the latency for > updates to the JobTracker (since Counters are sent at most 1/minute instead > of 1/heartbeat). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1182: - Attachment: M1182-0.patch (arranging patches for Hudson) > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1182: - Attachment: M1182-0.patch M1182-0v20.patch Patches changing shuffle arithmetic to use longs instead of ints. Retains the restriction on in-memory segments to maxint, though lifting that constraint can/should be explored in another issue. Including unit tests for this is impractical, but it will be tested manually. > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1182: - Attachment: (was: M1182-0.patch) > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1166) SerialUtils.cc: dynamic allocation of arrays based on runtime variable is not portable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas reassigned MAPREDUCE-1166: Assignee: Allen Wittenauer > SerialUtils.cc: dynamic allocation of arrays based on runtime variable is not > portable > -- > > Key: MAPREDUCE-1166 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1166 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: MAPREREDUCE-1166.patch > > > In SerialUtils.cc, the following code appears: > int len; > if (b < -120) { > negative = true; > len = -120 - b; > } else { > negative = false; > len = -112 - b; > } > uint8_t barr[len]; > as far as I'm aware, this is not legal in ANSI C and will be rejected by ANSI > compliant compilers. Instead, this should be malloc()'d based upon the size > of len and free()'d later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas reassigned MAPREDUCE-1182: Assignee: Chandra Prakash Bhagtani > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1182: - Status: Patch Available (was: Open) > Reducers fail with OutOfMemoryError while copying Map outputs > - > > Key: MAPREDUCE-1182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani > Fix For: 0.20.2 > > Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch > > > Reducers fail while copying Map outputs with following exception > java.lang.OutOfMemoryError: Java heap space at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) > ,Error: > Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value > I even tried with -Xmx6.5g to each reducer but it's still failing > While looking into the reducer logs, I found that reducers were doing > shuffleInMemory every time, rather than doing shuffleOnDisk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: MAPREDUCE-707-1-apache.patch A patch after incorporating changes suggested in the review comments. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773679#action_12773679 ] Alan Heirich commented on MAPREDUCE-707: Please review MAPREDUCE-707-1-apache.patch. Thanks. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: (was: MAPREDUCE-707-1-apache.patch) > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: MAPREDUCE-707-1-apache.patch revised patch per review comments > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Release Note: add mapred.fairscheduler.pool property to define which pool a job belongs to. Status: Patch Available (was: Open) > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773645#action_12773645 ] Matei Zaharia commented on MAPREDUCE-707: - Here are some comments on the patch: # Instead of using the string "mapred.fairscheduler.pool" in multiple places in PoolManager, make it a constant at the top of the file (something like EXPLICIT_POOL_PROPERTY). # Add a comment to PoolManager.getPoolName to explain the logic (first look for the explicit pool property, then for the property named by poolNameProperty, and finally default to DEFAULT_POOL_NAME). # Add a unit test for PoolManager.getPoolName that tries each of those cases (explicit property is set, no explicit property but poolNameProperty is used, or neither is used). Right now your existing unit test checks that setPool works but there's no test that submits a job with mapred.fairscheduler.pool directly. # Instead of assertEquals(0, scheduler.getPoolManager().getPoolName(job2).compareTo("poolA")) you can probably use a version of assertEquals that works on strings. # In the documentation, instead of saying "This property is ignored if mapred.fairscheduler.pool is specified." for the poolnameproperty, it would be better to say that the poolnameproperty is used only for jobs in which mapred.fairscheduler.pool is not explicitly set. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773631#action_12773631 ] Alan Heirich commented on MAPREDUCE-707: I would like to request a code review of MAPREDUCE-707-apache.patch, it is intended to resolve this JIRA. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Heirich updated MAPREDUCE-707: --- Attachment: MAPREDUCE-707-apache.patch adds mapred.fairscheduler.pool property, use it to specify pool name > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > Attachments: MAPREDUCE-707-apache.patch > > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773611#action_12773611 ] Devaraj Das commented on MAPREDUCE-1026: Kan the RPC port on the TaskTracker is supposed to be bound to only localhost. So others outside the node in question shouldn't be able to do RPC. But lets keep that discussion to a separate jira. > Shuffle should be secure > > > Key: MAPREDUCE-1026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security >Reporter: Owen O'Malley >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch > > > Since the user's data is available via http from the TaskTrackers, we should > require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773606#action_12773606 ] Hudson commented on MAPREDUCE-1153: --- Integrated in Hadoop-Mapreduce-trunk #133 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/]) > Metrics counting tasktrackers and blacklisted tasktrackers are not updated > when trackers are decommissioned. > > > Key: MAPREDUCE-1153 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Hemanth Yamijala >Assignee: Sharad Agarwal > Fix For: 0.22.0 > > Attachments: 1153.patch > > > MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of > actual, blacklisted and decommissioned tasktrackers. When a tracker is > decommissioned, the tasktracker count or the blacklisted tracker count is not > decremented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1038) Mumak's compile-aspects target weaves aspects even though there are no changes to the Mumak's sources
[ https://issues.apache.org/jira/browse/MAPREDUCE-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773602#action_12773602 ] Hudson commented on MAPREDUCE-1038: --- Integrated in Hadoop-Mapreduce-trunk #133 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/]) > Mumak's compile-aspects target weaves aspects even though there are no > changes to the Mumak's sources > - > > Key: MAPREDUCE-1038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1038 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: Aaron Kimball > Fix For: 0.21.0 > > Attachments: M1038-1.patch, MAPREDUCE-1038.patch > > > This is particularly time consuming and is the bottle neck even for a simple > ant build. In the case where no files have been updated in Mumak, there is no > reason to recompile sources along with the aspects. compile-aspects should > skip this step in these cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773605#action_12773605 ] Hudson commented on MAPREDUCE-971: -- Integrated in Hadoop-Mapreduce-trunk #133 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/]) > distcp does not always remove distcp.tmp.dir > > > Key: MAPREDUCE-971 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-971 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Fix For: 0.21.0 > > Attachments: MAPREDUCE-971.patch > > > Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop
[ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773604#action_12773604 ] Hudson commented on MAPREDUCE-1036: --- Integrated in Hadoop-Mapreduce-trunk #133 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/]) > An API Specification for Sqoop > -- > > Key: MAPREDUCE-1036 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: contrib/sqoop >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt > > > Over the last several months, Sqoop has evolved to a state that is functional > and has room for extensions. Developing extensions requires a stable API and > documentation. I am attaching to this ticket a description of Sqoop's design > and internal APIs, which include some open questions. I would like to solicit > input on the design regarding these open questions and standardize the API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773599#action_12773599 ] Kan Zhang commented on MAPREDUCE-1026: -- > This way the key is known only to the local task Also, no need to persist this key as part of the job. This key is just a runtime artifact of the Task and TT. > Shuffle should be secure > > > Key: MAPREDUCE-1026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security >Reporter: Owen O'Malley >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch > > > Since the user's data is available via http from the TaskTrackers, we should > require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773596#action_12773596 ] Kan Zhang commented on MAPREDUCE-1026: -- @Devaraj > Since the token will be used (later on in a separate jira) to bootstrap even > the task<->TT mutual authentication Are you talking about Task<->TT heartbeats over RPC? For this connection, I suggest we use a separate key (in the format of Delegation token) that is generated by TT and given to Task just before it is launched. This way the key is known only to the local task and helps prevent Tasks running on other machines connecting this TT accidentally. In terms of implementation, TT can do this in the same way that NN does, e.g., instantiate a DelegationTokenHandler for generating Delegation token and couple it with RPC (no need to persist the MasterKey though). > Shuffle should be secure > > > Key: MAPREDUCE-1026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security >Reporter: Owen O'Malley >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch > > > Since the user's data is available via http from the TaskTrackers, we should > require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773595#action_12773595 ] Matei Zaharia commented on MAPREDUCE-707: - In other words, I want to treat PoolManager, PoolSchedulable, JobSchedulable, etc as data structures, and decide externally when updates need to happen and when they don't, so that all that control logic is in one or two places (the event handlers in the FairScheduler and the UI code in FairSchedulerServlet). > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773594#action_12773594 ] Matei Zaharia commented on MAPREDUCE-707: - The reason I haven't made the PoolManager methods call updateDemand is that FairScheduler.update() does other things as well, and doing updateDemand without doing a full update() could potentially break some of the algorithms. (I'm not sure that it does so right now, but it would have been a problem in earlier versions). Therefore, I wanted all the updates to always happen through FairScheduler.update(). I'd rather not make the PoolManager call update() all the time because it would be better if the PoolManager didn't have to be modified whenever the structure of FairScheduler changes. All of the other unit tests call update() too, so I think it's fine not to do it in setPool. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773587#action_12773587 ] Alan Heirich commented on MAPREDUCE-707: I guess another way to put this is: if we need to call updateDemand() to keep the demand up to date, should setPool() call updateDemand() after changing the pool for a job? > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773586#action_12773586 ] Matei Zaharia commented on MAPREDUCE-707: - Hi Alan, Demands are only updated when the fair scheduler's update() function is called (which calls updateDemand in turn). All the code that calls setPool calls scheduler.update() afterwards. So you should do that in the unit test too. > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool
[ https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773579#action_12773579 ] Alan Heirich commented on MAPREDUCE-707: I'm finding that the demand for the mapSchedulable and reduceSchedulable objects are not updating as a result of removing and adding jobs to a pool. As a result of this calls to PoolManager.setPool do not cause the pool demands to update. (setPool calls removeJob() and addJob()). I've written a unit test that submits jobs to pools, tries to change their pool, and checks getDemand() to verify the right thing happened. This test is failing because getDemand() shows no changes in the demand. Is this the expected behavior of getDemand()??? > Provide a jobconf property for explicitly assigning a job to a pool > --- > > Key: MAPREDUCE-707 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-707 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: Matei Zaharia >Priority: Trivial > > A common use case of the fair scheduler is to have one pool per user, but > then to define some special pools for various production jobs, import jobs, > etc. Therefore, it would be nice if jobs went by default to the pool of the > user who submitted them, but there was a setting to explicitly place a job in > another pool. Today, this can be achieved through a sort of trick in the > JobConf: > {code} > > mapred.fairscheduler.poolnameproperty > pool.name > > > pool.name > ${user.name} > > {code} > This JIRA proposes to add a property called mapred.fairscheduler.pool that > allows a job to be placed directly into a pool, avoiding the need for this > trick. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-787) -files, -archives should honor user given symlink path
[ https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773525#action_12773525 ] Hadoop QA commented on MAPREDUCE-787: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424021/patch-787-2.txt against trunk revision 832362. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/console This message is automatically generated. > -files, -archives should honor user given symlink path > -- > > Key: MAPREDUCE-787 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-787 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt > > > Currently, if user gives an option such as > -files hdfs://host:fs_port/user/testfile.txt#testlink > The symlink name "testlink" is not honored. It alwasys creates symlink with > name testfile.txt in cwd of the task. > If the user has given a symlink name, it should be honored. If no > symlink-name is given, then the path.getName() can be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773485#action_12773485 ] Hadoop QA commented on MAPREDUCE-1140: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424018/patch-1140.txt against trunk revision 832362. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/console This message is automatically generated. > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-787) -files, -archives should honor user given symlink path
[ https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773452#action_12773452 ] Amareshwari Sriramadasu commented on MAPREDUCE-787: --- Ran ant docs on my machine. It passed successfully. > -files, -archives should honor user given symlink path > -- > > Key: MAPREDUCE-787 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-787 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt > > > Currently, if user gives an option such as > -files hdfs://host:fs_port/user/testfile.txt#testlink > The symlink name "testlink" is not honored. It alwasys creates symlink with > name testfile.txt in cwd of the task. > If the user has given a symlink name, it should be honored. If no > symlink-name is given, then the path.getName() can be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-787) -files, -archives should honor user given symlink path
[ https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-787: -- Attachment: patch-787-2.txt Patch incorporates comments 2 and 3. bq. In TaskDistributedCacheManager.makeCacheFiles, I think we should compare URI's instead of paths This looks like an intrusive change. This would require more public apis to changes. Can be done in different jira. > -files, -archives should honor user given symlink path > -- > > Key: MAPREDUCE-787 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-787 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt > > > Currently, if user gives an option such as > -files hdfs://host:fs_port/user/testfile.txt#testlink > The symlink name "testlink" is not honored. It alwasys creates symlink with > name testfile.txt in cwd of the task. > If the user has given a symlink name, it should be honored. If no > symlink-name is given, then the path.getName() can be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-787) -files, -archives should honor user given symlink path
[ https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-787: -- Status: Patch Available (was: Open) > -files, -archives should honor user given symlink path > -- > > Key: MAPREDUCE-787 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-787 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt > > > Currently, if user gives an option such as > -files hdfs://host:fs_port/user/testfile.txt#testlink > The symlink name "testlink" is not honored. It alwasys creates symlink with > name testfile.txt in cwd of the task. > If the user has given a symlink name, it should be honored. If no > symlink-name is given, then the path.getName() can be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-787) -files, -archives should honor user given symlink path
[ https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-787: -- Status: Open (was: Patch Available) > -files, -archives should honor user given symlink path > -- > > Key: MAPREDUCE-787 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-787 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Attachments: patch-787-1.txt, patch-787.txt > > > Currently, if user gives an option such as > -files hdfs://host:fs_port/user/testfile.txt#testlink > The symlink name "testlink" is not honored. It alwasys creates symlink with > name testfile.txt in cwd of the task. > If the user has given a symlink name, it should be honored. If no > symlink-name is given, then the path.getName() can be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1140: --- Status: Patch Available (was: Open) Simple patch fixing the bug. Added testcase. Testcase fails without the patch and passes with the patch. > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1140: --- Attachment: patch-1140.txt > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-787) -files, -archives should honor user given symlink path
[ https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773428#action_12773428 ] Jothi Padmanabhan commented on MAPREDUCE-787: - This looks good. Some minor points: # In {{TaskDistributedCacheManager.makeCacheFiles}}, I think we should compare URI's instead of paths # The documentation in mapred_tutorial can be a little more descriptive # {{fstream.close}} is missing for file f1 in {{TestCommandLineJobSubmission}} > -files, -archives should honor user given symlink path > -- > > Key: MAPREDUCE-787 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-787 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Attachments: patch-787-1.txt, patch-787.txt > > > Currently, if user gives an option such as > -files hdfs://host:fs_port/user/testfile.txt#testlink > The symlink name "testlink" is not honored. It alwasys creates symlink with > name testfile.txt in cwd of the task. > If the user has given a symlink name, it should be honored. If no > symlink-name is given, then the path.getName() can be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-217) Tasks to run on a different jvm version than the TaskTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773424#action_12773424 ] Amar Kamat commented on MAPREDUCE-217: -- bin/hadoop-config.sh by defaults adds JAVA_HOME/lib/tools.jar to the tasktracker's classpath which will be inherited by the child. Probably we should fix this to point to the configured java.home's tools.jar. > Tasks to run on a different jvm version than the TaskTracker > > > Key: MAPREDUCE-217 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-217 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Environment: linux >Reporter: Koji Noguchi >Assignee: Amar Kamat > Attachments: mapreduce-217-v1.0.patch, mapreduce-217-v1.1.patch > > > We use 32-bit jvm for TaskTrackers. > Sometimes our users want to call 64-bit JNI libraries from their tasks. > This requires tasks to be running on 64-bit jvm. > On Solaris, you can simply use -d32/-d64 to choose, but on Linux, it's on a > completely different package. > So far, tasks run on the same jvm version as the TaskTracker. > {noformat} > // use same jvm as parent > File jvm = new File(new File(System.getProperty("java.home"), "bin"), > "java"); > {noformat} > Is it possible to let users provide a java home path > or let them choose from a pre-selected list of paths? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.