[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930403#action_12930403
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

I'm not a fan of /user.  I can see major problems with sites that set quotas.

Why not just make it an explicit /tmp?

[FWIW, I use /system to keep it completely separate from everything.]



 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930412#action_12930412
 ] 

Todd Lipcon commented on MAPREDUCE-2181:


bq.  I can see major problems with sites that set quotas.

It seems to me that quotas should include temporary space used during job 
submission. If your job jar or dcache resources are very large, by all means it 
should count against your quota, don't you think? On any _reasonable_ workload, 
the space used in the staging directory will be many orders of magnitude lower 
than any user quotas, anyway.

bq. Why not just make it an explicit /tmp?

Because /tmp is not necessarily created on a fresh HDFS either. Note that this 
is an *HDFS* directory, not local fs.

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930425#action_12930425
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

I realize this is an HDFS dir. 

Let be more obvious:

What I'm worried about is that many sites with multiple users do:

... dfsadmin -setQuota value /user/* 

... so that all users have the same quota values.  [Making variable sizes of 
quotas is makes Hadoop nearly impossible support since there is no real quota 
reporting capabilities, short of traversing the file system looking for them.]  
In this case, it would basically mean that the JobTracker would be forced to 
contend with the same quota size as users. 

Even given your scenario above, this would mean that the JT space quota would 
need to be usersize*number of users, which is a bit ridiculous to maintain.

 [If anyone actually sets /user explicitly... well, I hope they aren't 
multi-user or have some sort of Plan.]

In any case, I'm still left with /user being not a good place to put system 
resources. There are reasons why everyone in the UNIX world doesn't put home 
directories under /usr anymore.  Mixing system bits and user bits is just bad 
practice.



 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930437#action_12930437
 ] 

Todd Lipcon commented on MAPREDUCE-2181:


bq. In any case, I'm still left with /user being not a good place to put system 
resources

I fail to see how the job staging directory is considered a system resource. 
It's per-user temporary data during the job submission process. Much like how 
web browsers store per-user caches in $HOME/.mozilla, the job submitter should 
put its data in $HOME/.staging.

Putting a big quota on /mapred and making /mapred/staging mode 777 (or mode 
1777 on trunk) just gives users one more place they can potentially abuse to 
store more data than they should be allowed.

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930438#action_12930438
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

Why would staging be 777?  IIRC, it should be owned and is only written to by 
the user that the JT runs as.  



 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930439#action_12930439
 ] 

Allen Wittenauer commented on MAPREDUCE-2181:
-

Actually, now that I think about it.

I don't really care.

It is a system tunable.There are so many other bad defaults, one more won't 
hurt.

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2181) mapreduce.jobtracker.staging.root.dir default is unreasonable

2010-11-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930454#action_12930454
 ] 

Todd Lipcon commented on MAPREDUCE-2181:


I think you're mixing up staging dir (a rather new config) and system dir (one 
that's been around a long time). staging dir is per-user, and written to by the 
user submitting the job (from the submitting machine)

 mapreduce.jobtracker.staging.root.dir default is unreasonable
 -

 Key: MAPREDUCE-2181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission, jobtracker
Affects Versions: 0.22.0
Reporter: Todd Lipcon

 The default for mapreduce.jobtracker.staging.root.dir is set to 
 ${hadoop.tmp.dir}/mapred/staging, which doesn't really work on a normal 
 cluster. hadoop.tmp.dir is overloaded in different places where sometimes it 
 is a local path and sometimes it is a path on HDFS, which makes things even 
 more confusing.
 We should change the default for the staging directory to /user (as is 
 suggested by the description of that configuration) and then fix 
 LocalJobRunner to use a different configuration -- perhaps 
 mapreduce.localjobrunner.staging.root.dir -- to make it clear that it's a 
 *local* path. That one could legitimately default to something inside 
 hadoop.tmp.dir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.