[jira] Commented: (MAPREDUCE-1592) Generate Eclipse's .classpath file from Ivy config

2010-11-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930018#action_12930018
 ] 

Hudson commented on MAPREDUCE-1592:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #533 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/533/])


 Generate Eclipse's .classpath file from Ivy config
 --

 Key: MAPREDUCE-1592
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1592
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1592.patch, MAPREDUCE-1592.patch, 
 MAPREDUCE-1592.patch, MAPREDUCE-1592.patch


 MapReduce companion issue for HADOOP-6407.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus

2010-11-09 Thread Harsh J Chouraria (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930068#action_12930068
 ] 

Harsh J Chouraria commented on MAPREDUCE-2170:
--

I've updated the patch on the review board.

 Send out last-minute load averages in TaskTrackerStatus
 ---

 Key: MAPREDUCE-2170
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
 Environment: GNU/Linux
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
Priority: Minor
 Fix For: 0.22.0

   Original Estimate: 0.33h
  Remaining Estimate: 0.33h

 Load averages could be useful in scheduling. This patch looks to extend the 
 existing Linux resource plugin (via /proc/loadavg file) to allow transmitting 
 load averages of the last one minute via the TaskTrackerStatus.
 Patch is up for review, with test cases added, at: 
 https://reviews.apache.org/r/20/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus

2010-11-09 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930194#action_12930194
 ] 

Arun C Murthy commented on MAPREDUCE-2170:
--

Harsh, can you please put up the patch here on jira and grant license to ASF 
for inclusion? Thanks.

 Send out last-minute load averages in TaskTrackerStatus
 ---

 Key: MAPREDUCE-2170
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
 Environment: GNU/Linux
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
Priority: Minor
 Fix For: 0.22.0

   Original Estimate: 0.33h
  Remaining Estimate: 0.33h

 Load averages could be useful in scheduling. This patch looks to extend the 
 existing Linux resource plugin (via /proc/loadavg file) to allow transmitting 
 load averages of the last one minute via the TaskTrackerStatus.
 Patch is up for review, with test cases added, at: 
 https://reviews.apache.org/r/20/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2170) Send out last-minute load averages in TaskTrackerStatus

2010-11-09 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930227#action_12930227
 ] 

Scott Chen commented on MAPREDUCE-2170:
---

Hey Harsh,
Why do we need to override getSystemLoadAverage()? The native one should also 
work for Linux case, right?
{code}
  @Override
  public float getSystemLoadAverage() {
readProcLoadAverageFile();
return loadAverage;
  }
{code}


 Send out last-minute load averages in TaskTrackerStatus
 ---

 Key: MAPREDUCE-2170
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2170
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
 Environment: GNU/Linux
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
Priority: Minor
 Fix For: 0.22.0

   Original Estimate: 0.33h
  Remaining Estimate: 0.33h

 Load averages could be useful in scheduling. This patch looks to extend the 
 existing Linux resource plugin (via /proc/loadavg file) to allow transmitting 
 load averages of the last one minute via the TaskTrackerStatus.
 Patch is up for review, with test cases added, at: 
 https://reviews.apache.org/r/20/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2169) Integrated Reed-Solomon code with RaidNode

2010-11-09 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2169:
---

Attachment: MAPREDUCE-2169.2.patch

TEST RESULTS:

ant test under raid:
{code}
test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 373.594 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 138.885 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 15.061 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.491 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.39 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.809 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 69.229 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 461.174 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 218.163 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 24.31 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 14.96 sec
[junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.368 sec

test:

BUILD SUCCESSFUL
Total time: 22 minutes 53 seconds

ant test-patch has the same result as a clean checkout (see MAPREDUCE-2176)
{code}

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 28 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 2 release audit 
warnings (more than the trunk's current 1 warnings).
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec]
 [exec]

{code}

 Integrated Reed-Solomon code with RaidNode
 --

 Key: MAPREDUCE-2169
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2169
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2169.2.patch, MAPREDUCE-2169.patch


 Scott Chen recently checked in an implementation of  the Reed Solomon code. 
 This task will track the integration of the code with RaidNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2180) Add coverage of fair scheduler servlet to system test

2010-11-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2180:
---

Fix Version/s: 0.22.0
   Status: Patch Available  (was: Open)

 Add coverage of fair scheduler servlet to system test
 -

 Key: MAPREDUCE-2180
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2180
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: contrib/fair-share
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: 0.22.0

 Attachments: mapreduce-2180.txt


 MAPREDUCE-2051 added a system test for the fair scheduler which starts a 
 minicluster and runs a couple jobs with preemption on. I recently found a 
 deadlock in a previous version of the scheduler that was due to lock 
 inversion between the scheduler servlet and some JT internals. I'd like to 
 modify the existing system test to also hit the /scheduler servlet, allowing 
 jcarder to detect such lock inversions in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2180) Add coverage of fair scheduler servlet to system test

2010-11-09 Thread Todd Lipcon (JIRA)
Add coverage of fair scheduler servlet to system test
-

 Key: MAPREDUCE-2180
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2180
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: contrib/fair-share
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


MAPREDUCE-2051 added a system test for the fair scheduler which starts a 
minicluster and runs a couple jobs with preemption on. I recently found a 
deadlock in a previous version of the scheduler that was due to lock inversion 
between the scheduler servlet and some JT internals. I'd like to modify the 
existing system test to also hit the /scheduler servlet, allowing jcarder to 
detect such lock inversions in the future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node

2010-11-09 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2167:
---

Attachment: MAPREDUCE-2167.4.patch

Fixed a broken test.

TEST RESULTS:


ant test-patch has the same number of failures as a clean checkout

{code}
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 4 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 2 release audit 
warnings (more than the trunk's current 1 warnings).
 [exec]
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec]
 [exec]
{code}

ant test succeeds:

{code}


test-junit:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hdfs.TestRaidDfs
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec
[junit] Running org.apache.hadoop.raid.TestBlockFixer
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec
[junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec
[junit] Running org.apache.hadoop.raid.TestErasureCodes
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec
[junit] Running org.apache.hadoop.raid.TestGaloisField
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec
[junit] Running org.apache.hadoop.raid.TestHarIndexParser
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
[junit] Running org.apache.hadoop.raid.TestRaidFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec
[junit] Running org.apache.hadoop.raid.TestRaidHar
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec
[junit] Running org.apache.hadoop.raid.TestRaidNode
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec
[junit] Running org.apache.hadoop.raid.TestRaidPurge
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec
[junit] Running org.apache.hadoop.raid.TestRaidShell
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec

test:

BUILD SUCCESSFUL
Total time: 15 minutes 6 seconds
{code}


 Faster directory traversal for raid node
 

 Key: MAPREDUCE-2167
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, 
 MAPREDUCE-2167.4.patch, MAPREDUCE-2167.patch


 The RaidNode currently iterates over the directory structure to figure out 
 which files to RAID. With millions of files, this can take a long time - 
 especially if some files are already RAIDed and the RaidNode needs to look at 
 parity files / parity file HARs to determine if the file needs to be RAIDed.
 The directory traversal is encapsulated inside the class DirectoryTraversal, 
 which examines one file at a time, using the caller's thread.
 My proposal is to make this multi-threaded as follows:
  * use a pool of threads inside DirectoryTraversal
  * The caller's thread is used to retrieve directories, and each new 
 directory is assigned to a thread in the pool. The worker thread examines all 
 the files the directory.
  * If there sub-directories, those are added back as workitems to the pool.
 Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2141) Add an extra data field to Task for use by Mesos

2010-11-09 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930373#action_12930373
 ] 

Owen O'Malley commented on MAPREDUCE-2141:
--

+1, looks good.

 Add an extra data field to Task for use by Mesos
 --

 Key: MAPREDUCE-2141
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2141
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-2141-v1.patch, mapreduce-2141-v2.patch


 In order to support running Hadoop on the Mesos cluster manager 
 (http://mesos.berkeley.edu/), I'd like to add an extra String field to the 
 Task class to allow extra data (a Mesos task ID) to be associated with each 
 task. This should have no impact on normal operation other than making the 
 serialized form of Task a few bytes longer. In the Mesos support patch for 
 Hadoop, this field is set by a pluggable Hadoop scheduler implementation to 
 allow code on the TaskTracker side to see which Mesos task corresponds to 
 each Hadoop task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)
mapreduce.jobtracker.staging.root.dir default is unreasonable
-

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon


The default for mapreduce.jobtracker.staging.root.dir is set to 
${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
cluster. hadoop.tmp.dir is overloaded in different places where sometimes it is 
a local path and sometimes it is a path on HDFS, which makes things even more 
confusing.

We should change the default for the staging directory to /user (as is 
suggested by the description of that configuration) and then fix LocalJobRunner 
to use a different configuration -- perhaps 
mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
*local* path. That one could legitimately default to something inside 
hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930403#action_12930403
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

I'm not a fan of /user.  I can see major problems with sites that set quotas.

Why not just make it an explicit /tmp?

[FWIW, I use /system to keep it completely separate from everything.]



 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2034) TestSubmitJob triggers NPE instead of permissions error

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930404#action_12930404
 ] 

Todd Lipcon commented on MAPREDUCE-2034:


Test patch result:

 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 13 new Findbugs 
warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 2 release audit 
warnings (more than the trunk's current 1 warnings).
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.

I checked the release audit and findbugs, and it seems like an issue with the 
test-patch script -- in both cases the new warnings were unrelated to the 
patch.



 TestSubmitJob triggers NPE instead of permissions error
 ---

 Key: MAPREDUCE-2034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2034
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Attachments: mapreduce-2034.txt


 TestSubmitJob.testSecureJobExecution catches _any_ IOException and assumes a 
 permissions error has been caught. In fact, it was passing an invalid path 
 name to the NameNode and triggering an NPE, not a Permission denied error, in 
 one case, but the test was not specific enough to detect this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930412#action_12930412
 ] 

Todd Lipcon commented on MAPREDUCE-2181:


bq.  I can see major problems with sites that set quotas.

It seems to me that quotas should include temporary space used during job 
submission. If your job jar or dcache resources are very large, by all means it 
should count against your quota, don't you think? On any _reasonable_ workload, 
the space used in the staging directory will be many orders of magnitude lower 
than any user quotas, anyway.

bq. Why not just make it an explicit /tmp?

Because /tmp is not necessarily created on a fresh HDFS either. Note that this 
is an *HDFS* directory, not local fs.

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930425#action_12930425
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

I realize this is an HDFS dir. 

Let be more obvious:

What I'm worried about is that many sites with multiple users do:

... dfsadmin -setQuota value /user/* 

... so that all users have the same quota values.  [Making variable sizes of 
quotas is makes Hadoop nearly impossible support since there is no real quota 
reporting capabilities, short of traversing the file system looking for them.]  
In this case, it would basically mean that the JobTracker would be forced to 
contend with the same quota size as users. 

Even given your scenario above, this would mean that the JT space quota would 
need to be usersize*number of users, which is a bit ridiculous to maintain.

 [If anyone actually sets /user explicitly... well, I hope they aren't 
multi-user or have some sort of Plan.]

In any case, I'm still left with /user being not a good place to put system 
resources. There are reasons why everyone in the UNIX world doesn't put home 
directories under /usr anymore.  Mixing system bits and user bits is just bad 
practice.



 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930437#action_12930437
 ] 

Todd Lipcon commented on MAPREDUCE-2181:


bq. In any case, I'm still left with /user being not a good place to put system 
resources

I fail to see how the job staging directory is considered a system resource. 
It's per-user temporary data during the job submission process. Much like how 
web browsers store per-user caches in $HOME/.mozilla, the job submitter should 
put its data in $HOME/.staging.

Putting a big quota on /mapred and making /mapred/staging mode 777 (or mode 
1777 on trunk) just gives users one more place they can potentially abuse to 
store more data than they should be allowed.

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930438#action_12930438
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

Why would staging be 777?  IIRC, it should be owned and is only written to by 
the user that the JT runs as.  



 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930439#action_12930439
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

Actually, now that I think about it.

I don't really care.

It is a system tunable.There are so many other bad defaults, one more won't 
hurt.

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930454#action_12930454
 ] 

Todd Lipcon commented on MAPREDUCE-2181:


I think you're mixing up staging dir (a rather new config) and system dir (one 
that's been around a long time). staging dir is per-user, and written to by the 
user submitting the job (from the submitting machine)

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-461) Enable ServicePlugins for the JobTracker

2010-11-09 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-461:


Attachment: MAPREDUCE-461.patch

Here's an updated patch which includes a unit test. It also addresses Amar's 
feedback.

 Enable ServicePlugins for the JobTracker
 

 Key: MAPREDUCE-461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-461
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Fredrik Hedberg
Priority: Minor
 Attachments: MAPREDUCE-461.patch, sp-jt-1.diff


 Allow ServicePlugins (see HADOOP-5257) for the JobTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-461) Enable ServicePlugins for the JobTracker

2010-11-09 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-461:


Status: Patch Available  (was: Open)

 Enable ServicePlugins for the JobTracker
 

 Key: MAPREDUCE-461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-461
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Fredrik Hedberg
Priority: Minor
 Attachments: MAPREDUCE-461.patch, sp-jt-1.diff


 Allow ServicePlugins (see HADOOP-5257) for the JobTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2073) TestTrackerDistributedCacheManager should be up-front about requirements on build environment

2010-11-09 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930491#action_12930491
 ] 

Tom White commented on MAPREDUCE-2073:
--

+1 This looks good to me. Can you run test-patch on it please?

 TestTrackerDistributedCacheManager should be up-front about requirements on 
 build environment
 -

 Key: MAPREDUCE-2073
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2073
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: distributed-cache, test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Attachments: mapreduce-2073-0.20.txt, mapreduce-2073.txt


 TestTrackerDistributedCacheManager will fail on a system where the build 
 directory is in any path where an ancestor doesn't have a+x permissions. On 
 one of our hudson boxes, for example, hudson's workspace had 700 permissions 
 and caused this test to fail reliably, but not in an obvious manner. It would 
 be helpful if the test failed with a more obvious error message during 
 setUp() when the build environment is misconfigured.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.