date:20091104

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773764#action_12773764
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

Thanks Alan, this looks good! +1 from me. I'll wait for the Hudson automated 
test and then commit it if there are no warnings.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, 
> MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated MAPREDUCE-707:


Status: Open  (was: Patch Available)

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, 
> MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated MAPREDUCE-707:


Status: Patch Available  (was: Open)

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, 
> MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al

2009-11-04 Thread Hong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773759#action_12773759
 ] 

Hong Tang commented on MAPREDUCE-1183:
--

Why do we need to serialize mappers and reducers? 

> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -
>
> Key: MAPREDUCE-1183
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Currently the Map-Reduce framework uses Configuration to pass information 
> about the various aspects of a job such as Mapper, Reducer, InputFormat, 
> OutputFormat, OutputCommitter etc. and application developers use 
> org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission 
> time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with 
> org.apache.hadoop.mapreduce.Job via actual objects which are then serialized 
> by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al

2009-11-04 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773756#action_12773756
 ] 

Arun C Murthy commented on MAPREDUCE-1183:
--

The current Configuration-based system has issues in a couple of use-cases:
# The primary drawback: Difficulty in implementing a 
Composite{Input|Output}Format
Pig is in the middle of a re-write of their Load/Store interfaces 
(http://wiki.apache.org/pig/LoadStoreRedesignProposal) where they want to be 
able to take an arbitrary InputFormat or OutputFormat and wrap it for use 
within Pig. Similarly a 'CompositeInputFormat' which can work with multiple 
InputFormats (say a map-side merge between data in multiple SequenceFiles and 
TFiles) leads to a situation where we push the {Input|Output}Format to deal 
with multiple copies of Configuration and manage them. This necessary because 
using a single Configuration results in same configuration key being 
over-written by multiple instances of {Input|Output}Format (say 
mapred.input.dir over-written by SequenceFileInputFormat and TFileInputFormat).
# Annoyance: An application which needs a very small amount of state in the 
Mapper/Reducer (say a small map of metadata) is forced to use DistributedCache, 
it's much more natural to have that state stored in the Mapper/Reducer and have 
it serialized from the client to the compute nodes.

Thus the proposal is to move to a model where an actual 
Mapper/Reducer/InputFormat/OutputFormat object is serialized by the framework, 
thus eliminating the need for using Configuration for storing the requisite 
information and using the object to keep the necessary state e.g. 
FileInputFormat will have a member to keep a list of input-paths to be 
processed.

The new api would look like:
{noformat}
Job job = new Job();
job.setMapper(new WordCountMapper());
job.setReducer(new WordCountReducer());
InputFormat in = new TextInputFormat("in");
in.addInputPath("in2");
OutputFormat out = new TextOutputFormat("out");
job.setInputFormat(in);
job.setOutputFormat(out);
job.waitForCompletion();
{noformat}



Thoughts?

> Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
> -
>
> Key: MAPREDUCE-1183
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> Currently the Map-Reduce framework uses Configuration to pass information 
> about the various aspects of a job such as Mapper, Reducer, InputFormat, 
> OutputFormat, OutputCommitter etc. and application developers use 
> org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission 
> time:
> {noformat}
> Job.setMapperClass(IdentityMapper.class);
> Job.setReducerClass(IdentityReducer.class);
> Job.setInputFormatClass(TextInputFormat.class);
> Job.setOutputFormatClass(TextOutputFormat.class);
> ...
> {noformat}
> The proposal is that we move to a model where end-users interact with 
> org.apache.hadoop.mapreduce.Job via actual objects which are then serialized 
> by the framework:
> {noformat}
> Job.setMapper(new IdentityMapper());
> Job.setReducer(new IdentityReducer());
> Job.setInputFormat(new TextInputFormat("in"));
> Job.setOutputFormat(new TextOutputFormat("out"));
> ...
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: MAPREDUCE-707-2-apache.patch

corrected patch

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, 
> MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: (was: MAPREDUCE-707-2-apache.patch)

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773751#action_12773751
 ] 

Alan Heirich commented on MAPREDUCE-707:


Oops - I thought TestFairScheduler would be run as part of "ant test".  I guess 
not.  It passes now.

Hudson reported that TestGridmixSubmission failed, but that passes in my 
workspace.  I'm on Mac OS X and I saw some tests fail from a fresh build that 
should have passed.

Please see MAPREDUCE-707-2-apache.patch

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, 
> MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: MAPREDUCE-707-2-apache.patch

further revisions per comments.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, 
> MAPREDUCE-707-2-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773746#action_12773746
 ] 

Hadoop QA commented on MAPREDUCE-1182:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424074/M1182-0.patch
  against trunk revision 832362.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/125/console

This message is automatically generated.

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
>Priority: Blocker
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1183) Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al

2009-11-04 Thread Arun C Murthy (JIRA)

Serializable job components: Mapper, Reducer, InputFormat, OutputFormat et al
-

 Key: MAPREDUCE-1183
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1183
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy


Currently the Map-Reduce framework uses Configuration to pass information about 
the various aspects of a job such as Mapper, Reducer, InputFormat, 
OutputFormat, OutputCommitter etc. and application developers use 
org.apache.hadoop.mapreduce.Job.set*Class apis to set them at job-submission 
time:

{noformat}
Job.setMapperClass(IdentityMapper.class);
Job.setReducerClass(IdentityReducer.class);
Job.setInputFormatClass(TextInputFormat.class);
Job.setOutputFormatClass(TextOutputFormat.class);
...
{noformat}

The proposal is that we move to a model where end-users interact with 
org.apache.hadoop.mapreduce.Job via actual objects which are then serialized by 
the framework:
{noformat}
Job.setMapper(new IdentityMapper());
Job.setReducer(new IdentityReducer());
Job.setInputFormat(new TextInputFormat("in"));
Job.setOutputFormat(new TextOutputFormat("out"));
...
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Boris Shkolnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-1026:
--

Attachment: MAPREDUCE-1026-1.patch

bq. 1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens
done
bq. 2) The call to localizeJobTokenFile should be done before the call to 
taskController.initializeJob(context) in the TaskTracker.localizeJob method. 
Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles
bq. 3) Minor: for the request/response HTTP headers, make the first character 
upper case
done
bq. 4) HMacUtil could override the equals method and put in logic for comapring 
two HMacUtil objects, instead of defining verifyHash.
We are note really comparing HMacUtil objects, they are just utilities. So I 
think verifyHash() should be more logical.
bq. 5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be 
Writable (as opposed to having to define load/store methods) 
Comp is used in the TreeMap constructor as the comparator.

Also added synchronization around the map of StoreKeys updates in TaskTracker.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026.patch, 
> MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773734#action_12773734
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

By the way, here's a tip if you want to run the unit tests faster - you can run 
just the fair scheduler's unit test with ant -Dtestcase=TestFairScheduler test.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773733#action_12773733
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

This looks pretty good, except that testPoolAssignment fails when I run the 
unit tests. I think the problem is with job4, where you set 
"mapred.fairscheduler.poolnameproperty" in the job's Configuration (jobConf2), 
not in the fair scheduler's configuration. You need to set the poolNameProperty 
when you create the fair scheduler object. That's what the code used to do with 
the POOL_PROPERTY string at the top, but you can't set the pool name property 
to mapred.fairscheduler.pool, because that wouldn't be testing anything. I'd 
suggest leaving the POOL_PROPERTY as "pool" and trying to set job4's pool 
through that.

Also, for sanity, in job1 (where you set mapred.fairscheduler.pool directly), 
you should say the "pool" property to something other than poolA to make sure 
it isn't used.

Finally, two small nitpicks:

# In the test line with 
assertEquals(scheduler.getPoolManager().getPoolName(job2), "poolA"), you should 
switch the two parameters (put "poolA" first); the first parameter is always 
supposed to be the value expected.
# Regarding the comment on getPoolName - the pool name property used by default 
is "user.name", not "project". I think I forgot to fix that comment a while 
back.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1159) Limit Job name on jobtracker.jsp to be 80 char long

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1159:
-

Status: Open  (was: Patch Available)

Canceling patch while Nicholas's comments are addressed

> Limit Job name on jobtracker.jsp to be 80 char long
> ---
>
> Key: MAPREDUCE-1159
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1159
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Trivial
> Attachments: MAPREDUCE-1159.trunk.patch
>
>
> Sometimes a user submits a job with a very long job name. That made 
> jobtracker.jsp very hard to read.
> We should limit the size of the job name. User can see the full name when 
> they click on the job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1182:
-

Priority: Blocker  (was: Major)

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
>Priority: Blocker
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1127) distcp should timeout later during S3-based transfers

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1127:
-

Status: Open  (was: Patch Available)

On reflection... the documentation change is in 0.21, so it may not be worth 
adding and removing this special case in 0.22. Would you object to calling the 
documentation change "sufficient" and exploring real fixes for S3 in 0.22? 
Depending on how that is resolved, this workaround may become necessary again.

> distcp should timeout later during S3-based transfers
> -
>
> Key: MAPREDUCE-1127
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1127
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1127.2.patch, MAPREDUCE-1127.patch
>
>
> Per MAPREDUCE-972, rename and other operations on distcp can take longer than 
> the typical mapreduce task timeout. As an interim fix, this timeout should be 
> increased when the distcp destination is S3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773728#action_12773728
 ] 

Hadoop QA commented on MAPREDUCE-707:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424051/MAPREDUCE-707-apache.patch
  against trunk revision 832362.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/224/console

This message is automatically generated.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773709#action_12773709
 ] 

Devaraj Das commented on MAPREDUCE-1026:


My worry on the reduce task killing itself can be ignored. That is the right 
thing to happen as Boris and I discussed offline..

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1056) [Mumak] Add forrest documentation for mumak

2009-11-04 Thread Hong Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1056:
-

Priority: Blocker  (was: Major)

> [Mumak] Add forrest documentation for mumak
> ---
>
> Key: MAPREDUCE-1056
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1056
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Priority: Blocker
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1137) Mumak should have a unit test to ensure jetty UI is running properly

2009-11-04 Thread Hong Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1137:
-

Priority: Blocker  (was: Major)

> Mumak should have a unit test to ensure jetty UI is running properly
> 
>
> Key: MAPREDUCE-1137
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1137
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/mumak
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Hong Tang
>Priority: Blocker
>
> Mumak should have a unit test that ensures jetty UI is running properly. This 
> will help detecting issues like MAPREDUCE-1104 sooner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1173) Documenting MapReduce metrics

2009-11-04 Thread Hong Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1173:
-

Priority: Blocker  (was: Major)

> Documenting MapReduce metrics
> -
>
> Key: MAPREDUCE-1173
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1173
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Hong Tang
>Priority: Blocker
>
> As part of HADOOP-6350, we should document the metrics for JobTracker and 
> TaskTracker as part of their interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1175) We should have a spec on JobHistory file format

2009-11-04 Thread Hong Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1175:
-

Priority: Blocker  (was: Major)

> We should have a spec on JobHistory file format
> ---
>
> Key: MAPREDUCE-1175
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1175
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Hong Tang
>Priority: Blocker
>
> Currently, JobHistory schema is specified in o.a.h.m.jobhistory.Event.avpr, 
> it requires some guess work for me to understand the meaning of various 
> records. Also, it would be nice to spec out the dependency among the events. 
> This would make tools like rumen more dependable to parse job history logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773701#action_12773701
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Looked at the patch some more. Few more comments:
1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens
2) The call to localizeJobTokenFile should be done before the call to 
taskController.initializeJob(context) in the TaskTracker.localizeJob method. 
Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles
3) Minor: for the request/response HTTP headers, make the first character upper 
case
4) HMacUtil could override the equals method and put in logic for comapring two 
HMacUtil objects, instead of defining verifyHash.
5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be 
Writable (as opposed to having to define load/store methods)

For the case where a reduce task fails due to the TaskTracker(s) not being 
authentic, we probably need care. Two things might happen - the JobTracker 
might get enough notifications from other reduces in the system, and it might 
just decide to re-execute the map. The other situation is what is bothering me 
- the reduce task would kill itself after a certain threshold number of trials. 
This would be bad. IIRC it is not predictable which one could happen first.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2009-11-04 Thread Hong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773700#action_12773700
 ] 

Hong Tang commented on MAPREDUCE-901:
-

Like metrics, we should also clearly document the framework counters. 

> Move Framework Counters into a TaskMetric structure
> ---
>
> Key: MAPREDUCE-901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Owen O'Malley
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
> Attachments: 901_1.patch, 901_1.patch, MAPREDUCE-901.patch, 
> MAPREDUCE-901.patch
>
>
> I think we should move all of the Counters that the framework updates into a 
> single class called TaskMetrics. TaskMetrics would have specific fields for 
> each of the metrics like input records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the 
> Counters down to just the user's counters) and decrease the latency for 
> updates to the JobTracker (since Counters are sent at most 1/minute instead 
> of 1/heartbeat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1182:
-

Attachment: M1182-0.patch

(arranging patches for Hudson)

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1182:
-

Attachment: M1182-0.patch
M1182-0v20.patch

Patches changing shuffle arithmetic to use longs instead of ints. Retains the 
restriction on in-memory segments to maxint, though lifting that constraint 
can/should be explored in another issue.

Including unit tests for this is impractical, but it will be tested manually.

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1182:
-

Attachment: (was: M1182-0.patch)

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1166) SerialUtils.cc: dynamic allocation of arrays based on runtime variable is not portable

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas reassigned MAPREDUCE-1166:


Assignee: Allen Wittenauer

> SerialUtils.cc: dynamic allocation of arrays based on runtime variable is not 
> portable
> --
>
> Key: MAPREDUCE-1166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1166
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: MAPREREDUCE-1166.patch
>
>
> In SerialUtils.cc, the following code appears:
> int len;
> if (b < -120) {
>   negative = true;
>   len = -120 - b;
> } else {
>   negative = false;
>   len = -112 - b;
> }
> uint8_t barr[len];
> as far as I'm aware, this is not legal in ANSI C and will be rejected by ANSI 
> compliant compilers.  Instead, this should be malloc()'d based upon the size 
> of len and free()'d later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas reassigned MAPREDUCE-1182:


Assignee: Chandra Prakash Bhagtani

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1182) Reducers fail with OutOfMemoryError while copying Map outputs

2009-11-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1182:
-

Status: Patch Available  (was: Open)

> Reducers fail with OutOfMemoryError while copying Map outputs
> -
>
> Key: MAPREDUCE-1182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
> Fix For: 0.20.2
>
> Attachments: HADOOP-6357.patch, M1182-0.patch, M1182-0v20.patch
>
>
> Reducers fail while copying Map outputs with following exception
> java.lang.OutOfMemoryError: Java heap space at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
>  ,Error:
> Reducer's memory usage keeps on increasing and ultimately exceeds -Xmx value  
> I even tried with -Xmx6.5g to each reducer but it's still failing 
> While looking into the reducer logs, I found that reducers were doing 
> shuffleInMemory every time, rather than doing shuffleOnDisk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: MAPREDUCE-707-1-apache.patch

A patch after incorporating changes suggested in the review comments.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773679#action_12773679
 ] 

Alan Heirich commented on MAPREDUCE-707:


Please review MAPREDUCE-707-1-apache.patch.  Thanks.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: (was: MAPREDUCE-707-1-apache.patch)

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: MAPREDUCE-707-1-apache.patch

revised patch per review comments

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-1-apache.patch, MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Release Note: add mapred.fairscheduler.pool property to define which pool a 
job belongs to.
  Status: Patch Available  (was: Open)

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773645#action_12773645
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

Here are some comments on the patch:

# Instead of using the string "mapred.fairscheduler.pool" in multiple places in 
PoolManager, make it a constant at the top of the file (something like 
EXPLICIT_POOL_PROPERTY).
# Add a comment to PoolManager.getPoolName to explain the logic (first look for 
the explicit pool property, then for the property named by poolNameProperty, 
and finally default to DEFAULT_POOL_NAME).
# Add a unit test for PoolManager.getPoolName that tries each of those cases 
(explicit property is set, no explicit property but poolNameProperty is used, 
or neither is used). Right now your existing unit test checks that setPool 
works but there's no test that submits a job with mapred.fairscheduler.pool 
directly.
# Instead of assertEquals(0,
scheduler.getPoolManager().getPoolName(job2).compareTo("poolA")) you can 
probably use a version of assertEquals that works on strings.
# In the documentation, instead of saying "This property is ignored if 
mapred.fairscheduler.pool is specified." for the poolnameproperty, it would be 
better to say that the poolnameproperty is used only for jobs in which 
mapred.fairscheduler.pool is not explicitly set.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773631#action_12773631
 ] 

Alan Heirich commented on MAPREDUCE-707:


I would like to request a code review of MAPREDUCE-707-apache.patch, it is 
intended to resolve this JIRA.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Heirich updated MAPREDUCE-707:
---

Attachment: MAPREDUCE-707-apache.patch

adds mapred.fairscheduler.pool property, use it to specify pool name 

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
> Attachments: MAPREDUCE-707-apache.patch
>
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773611#action_12773611
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Kan the RPC port on the TaskTracker is supposed to be bound to only localhost. 
So others outside the node in question shouldn't be able to do RPC. 
But lets keep that discussion to a separate jira. 

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.

2009-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773606#action_12773606
 ] 

Hudson commented on MAPREDUCE-1153:
---

Integrated in Hadoop-Mapreduce-trunk #133 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/])


> Metrics counting tasktrackers and blacklisted tasktrackers are not updated 
> when trackers are decommissioned.
> 
>
> Key: MAPREDUCE-1153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.22.0
>Reporter: Hemanth Yamijala
>Assignee: Sharad Agarwal
> Fix For: 0.22.0
>
> Attachments: 1153.patch
>
>
> MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of 
> actual, blacklisted and decommissioned tasktrackers. When a tracker is 
> decommissioned, the tasktracker count or the blacklisted tracker count is not 
> decremented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1038) Mumak's compile-aspects target weaves aspects even though there are no changes to the Mumak's sources

2009-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773602#action_12773602
 ] 

Hudson commented on MAPREDUCE-1038:
---

Integrated in Hadoop-Mapreduce-trunk #133 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/])


> Mumak's compile-aspects target weaves aspects even though there are no 
> changes to the Mumak's sources
> -
>
> Key: MAPREDUCE-1038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinod K V
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: M1038-1.patch, MAPREDUCE-1038.patch
>
>
> This is particularly time consuming and is the bottle neck even for a simple 
> ant build. In the case where no files have been updated in Mumak, there is no 
> reason to recompile sources along with the aspects. compile-aspects should 
> skip this step in these cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

2009-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773605#action_12773605
 ] 

Hudson commented on MAPREDUCE-971:
--

Integrated in Hadoop-Mapreduce-trunk #133 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/])


> distcp does not always remove distcp.tmp.dir
> 
>
> Key: MAPREDUCE-971
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

2009-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773604#action_12773604
 ] 

Hudson commented on MAPREDUCE-1036:
---

Integrated in Hadoop-Mapreduce-trunk #133 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/])


> An API Specification for Sqoop
> --
>
> Key: MAPREDUCE-1036
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: contrib/sqoop
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional 
> and has room for extensions. Developing extensions requires a stable API and 
> documentation. I am attaching to this ticket a description of Sqoop's design 
> and internal APIs, which include some open questions. I would like to solicit 
> input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Kan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773599#action_12773599
 ] 

Kan Zhang commented on MAPREDUCE-1026:
--

> This way the key is known only to the local task
Also, no need to persist this key as part of the job. This key is just a 
runtime artifact of the Task and TT.

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Kan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773596#action_12773596
 ] 

Kan Zhang commented on MAPREDUCE-1026:
--

@Devaraj
> Since the token will be used (later on in a separate jira) to bootstrap even 
> the task<->TT mutual authentication
Are you talking about Task<->TT heartbeats over RPC? For this connection, I 
suggest we use a separate key (in the format of Delegation token) that is 
generated by TT and given to Task just before it is launched. This way the key 
is known only to the local task and helps prevent Tasks running on other 
machines connecting this TT accidentally. In terms of implementation, TT can do 
this in the same way that NN does, e.g., instantiate a DelegationTokenHandler 
for generating Delegation token and couple it with RPC (no need to persist the 
MasterKey though).

> Shuffle should be secure
> 
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch
>
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773595#action_12773595
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

In other words, I want to treat PoolManager, PoolSchedulable, JobSchedulable, 
etc as data structures, and decide externally when updates need to happen and 
when they don't, so that all that control logic is in one or two places (the 
event handlers in the FairScheduler and the UI code in FairSchedulerServlet).

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773594#action_12773594
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

The reason I haven't made the PoolManager methods call updateDemand is that 
FairScheduler.update() does other things as well, and doing updateDemand 
without doing a full update() could potentially break some of the algorithms. 
(I'm not sure that it does so right now, but it would have been a problem in 
earlier versions). Therefore, I wanted all the updates to always happen through 
FairScheduler.update(). I'd rather not make the PoolManager call update() all 
the time because it would be better if the PoolManager didn't have to be 
modified whenever the structure of FairScheduler changes. All of the other unit 
tests call update() too, so I think it's fine not to do it in setPool.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773587#action_12773587
 ] 

Alan Heirich commented on MAPREDUCE-707:


I guess another way to put this is: if we need to call updateDemand() to keep 
the demand up to date, should setPool() call updateDemand() after changing the 
pool for a job?


> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773586#action_12773586
 ] 

Matei Zaharia commented on MAPREDUCE-707:
-

Hi Alan,

Demands are only updated when the fair scheduler's update() function is called 
(which calls updateDemand in turn). All the code that calls setPool calls 
scheduler.update() afterwards. So you should do that in the unit test too.

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-707) Provide a jobconf property for explicitly assigning a job to a pool

2009-11-04 Thread Alan Heirich (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773579#action_12773579
 ] 

Alan Heirich commented on MAPREDUCE-707:


I'm finding that the demand for the mapSchedulable and reduceSchedulable 
objects are not updating as a result of removing and adding jobs to a pool.  As 
a result of this calls to PoolManager.setPool do not cause the pool demands to 
update.  (setPool calls removeJob() and addJob()).

I've written a unit test that submits jobs to pools, tries to change their 
pool, and checks getDemand() to verify the right thing happened.  This test is 
failing because getDemand() shows no changes in the demand.

Is this the expected behavior of getDemand()???

> Provide a jobconf property for explicitly assigning a job to a pool
> ---
>
> Key: MAPREDUCE-707
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-707
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: Matei Zaharia
>Priority: Trivial
>
> A common use case of the fair scheduler is to have one pool per user, but 
> then to define some special pools for various production jobs, import jobs, 
> etc. Therefore, it would be nice if jobs went by default to the pool of the 
> user who submitted them, but there was a setting to explicitly place a job in 
> another pool. Today, this can be achieved through a sort of trick in the 
> JobConf:
> {code}
> 
>   mapred.fairscheduler.poolnameproperty
>   pool.name
> 
> 
>   pool.name
>   ${user.name}
> 
> {code}
> This JIRA proposes to add a property called mapred.fairscheduler.pool that 
> allows a job to be placed directly into a pool, avoiding the need for this 
> trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-787) -files, -archives should honor user given symlink path

2009-11-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773525#action_12773525
 ] 

Hadoop QA commented on MAPREDUCE-787:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424021/patch-787-2.txt
  against trunk revision 832362.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/124/console

This message is automatically generated.

> -files, -archives should honor user given symlink path
> --
>
> Key: MAPREDUCE-787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt
>
>
> Currently, if user gives an option such as
> -files hdfs://host:fs_port/user/testfile.txt#testlink
> The symlink name "testlink" is not honored. It alwasys creates symlink with 
> name testfile.txt in cwd of the task.
> If the user has given a symlink name, it should be honored. If no 
> symlink-name is given, then the path.getName() can be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773485#action_12773485
 ] 

Hadoop QA commented on MAPREDUCE-1140:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424018/patch-1140.txt
  against trunk revision 832362.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/223/console

This message is automatically generated.

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --
>
> Key: MAPREDUCE-1140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-787) -files, -archives should honor user given symlink path

2009-11-04 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773452#action_12773452
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-787:
---

Ran ant docs on my machine. It passed successfully.

> -files, -archives should honor user given symlink path
> --
>
> Key: MAPREDUCE-787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt
>
>
> Currently, if user gives an option such as
> -files hdfs://host:fs_port/user/testfile.txt#testlink
> The symlink name "testlink" is not honored. It alwasys creates symlink with 
> name testfile.txt in cwd of the task.
> If the user has given a symlink name, it should be honored. If no 
> symlink-name is given, then the path.getName() can be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-787) -files, -archives should honor user given symlink path

2009-11-04 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-787:
--

Attachment: patch-787-2.txt

Patch incorporates comments 2 and 3.
bq. In TaskDistributedCacheManager.makeCacheFiles, I think we should compare 
URI's instead of paths
This looks like an intrusive change. This would require more public apis to 
changes. Can be done in different jira.

> -files, -archives should honor user given symlink path
> --
>
> Key: MAPREDUCE-787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt
>
>
> Currently, if user gives an option such as
> -files hdfs://host:fs_port/user/testfile.txt#testlink
> The symlink name "testlink" is not honored. It alwasys creates symlink with 
> name testfile.txt in cwd of the task.
> If the user has given a symlink name, it should be honored. If no 
> symlink-name is given, then the path.getName() can be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-787) -files, -archives should honor user given symlink path

2009-11-04 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-787:
--

Status: Patch Available  (was: Open)

> -files, -archives should honor user given symlink path
> --
>
> Key: MAPREDUCE-787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-787-1.txt, patch-787-2.txt, patch-787.txt
>
>
> Currently, if user gives an option such as
> -files hdfs://host:fs_port/user/testfile.txt#testlink
> The symlink name "testlink" is not honored. It alwasys creates symlink with 
> name testfile.txt in cwd of the task.
> If the user has given a symlink name, it should be honored. If no 
> symlink-name is given, then the path.getName() can be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-787) -files, -archives should honor user given symlink path

2009-11-04 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-787:
--

Status: Open  (was: Patch Available)

> -files, -archives should honor user given symlink path
> --
>
> Key: MAPREDUCE-787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-787-1.txt, patch-787.txt
>
>
> Currently, if user gives an option such as
> -files hdfs://host:fs_port/user/testfile.txt#testlink
> The symlink name "testlink" is not honored. It alwasys creates symlink with 
> name testfile.txt in cwd of the task.
> If the user has given a symlink name, it should be honored. If no 
> symlink-name is given, then the path.getName() can be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-04 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Status: Patch Available  (was: Open)

Simple patch fixing the bug. Added testcase. Testcase fails without the patch 
and passes with the patch.

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --
>
> Key: MAPREDUCE-1140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-04 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Attachment: patch-1140.txt

> Per cache-file refcount can become negative when tasks release 
> distributed-cache files
> --
>
> Key: MAPREDUCE-1140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-787) -files, -archives should honor user given symlink path

2009-11-04 Thread Jothi Padmanabhan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773428#action_12773428
 ] 

Jothi Padmanabhan commented on MAPREDUCE-787:
-

This looks good. Some minor points:

# In {{TaskDistributedCacheManager.makeCacheFiles}}, I think we should compare 
URI's instead of paths
# The documentation in mapred_tutorial can be a little more descriptive
# {{fstream.close}} is missing for file f1 in {{TestCommandLineJobSubmission}}

> -files, -archives should honor user given symlink path
> --
>
> Key: MAPREDUCE-787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-787-1.txt, patch-787.txt
>
>
> Currently, if user gives an option such as
> -files hdfs://host:fs_port/user/testfile.txt#testlink
> The symlink name "testlink" is not honored. It alwasys creates symlink with 
> name testfile.txt in cwd of the task.
> If the user has given a symlink name, it should be honored. If no 
> symlink-name is given, then the path.getName() can be used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-217) Tasks to run on a different jvm version than the TaskTracker

2009-11-04 Thread Amar Kamat (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773424#action_12773424
 ] 

Amar Kamat commented on MAPREDUCE-217:
--

bin/hadoop-config.sh by defaults adds JAVA_HOME/lib/tools.jar to the 
tasktracker's classpath which will be inherited by the child. Probably we 
should fix this to point to the configured java.home's tools.jar.

> Tasks to run on a different jvm version than the TaskTracker
> 
>
> Key: MAPREDUCE-217
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-217
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
> Environment: linux
>Reporter: Koji Noguchi
>Assignee: Amar Kamat
> Attachments: mapreduce-217-v1.0.patch, mapreduce-217-v1.1.patch
>
>
> We use 32-bit jvm for TaskTrackers. 
> Sometimes our users want to call 64-bit JNI libraries from their tasks.
> This requires tasks to be running on 64-bit jvm.
> On Solaris, you can simply use -d32/-d64 to choose, but on Linux, it's on a 
> completely different package.
> So far, tasks run on the same jvm version as the TaskTracker.
> {noformat}
> // use same jvm as parent
> File jvm =   new File(new File(System.getProperty("java.home"), "bin"), 
> "java");
> {noformat}
> Is it possible to let users provide a java home path 
> or let them choose from a pre-selected list of paths?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

61 matches

Mail list logo