from:"Xi Fang \(JIRA\)"

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-11-07 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816210#comment-13816210
 ] 

Xi Fang commented on MAPREDUCE-5508:


One way to confirm that is to set to mapred.jobtracker.completeuserjobs.maximum 
= 0 and run some jobs. After all the jobs are done, wait for a while and check 
the number of FS objects in FileSystem#Cache.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Fix For: 1-win, 1.3.0

 Attachments: CleanupQueue.java, JobInProgress.java, 
 MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, MAPREDUCE-5508.3.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-10-21 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13800914#comment-13800914
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris and  viswanathan. And I think the three patches are what you need. 
It won't affect production environment because it is a very back-end thing. 
Users won't notice any difference I think.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-24 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776023#comment-13776023
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris and Sandy!

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.2.patch

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774322#comment-13774322
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris. I attached a new patch and will launch a large scale test 
tomorrow.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.3.patch

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775904#comment-13775904
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris and Sandy. I just finished the large scale test. I didn't find 
memory leak in my test. I removed tabs and attached a new patch. 

So Chris, do you think we should file a new Jira for the idempotent 
implementation?



 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-22 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774280#comment-13774280
 ] 

Xi Fang commented on MAPREDUCE-5508:


[~cnauroth], thanks for your comments. 
bq. Swallowing the InterruptedException is problematic if any upstream code 
depends on seeing the thread's interrupted status, so let's restore the 
interrupted status in the catch block by calling 
Thread.currentThread().interrupt().

If we call Thread.currentThread().interrupt(), is that possible that fs won't 
be closed in JobInProgress#cleanupJob()?

bq. If there is an InterruptedException, then we currently would pass a null 
tempDirFs to the CleanupQueue, where we'd once again risk leaking memory. I 
suggest that if there is an InterruptedException, then we skip adding to the 
CleanupQueue and log a warning. This is consistent with the error-handling 
strategy in the rest of the method. (It logs warnings.)

I think if the answer to my first question is fs will be closed in 
JobInProgress#cleanupJob(), there will be no memory leak. This is because even 
if we pass null into CleanupQueue, the new fs created in 
CleanupQueue#deletePath() would be closed anyway.

Thanks Chris.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-19 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772073#comment-13772073
 ] 

Xi Fang commented on MAPREDUCE-5508:


I set both staging and system dirs to hdfs on my test cluster. I ran 35,000 job 
submissions and manually checked the number of DistributedFileSystem objects. 
No memory leak related to DistributedFileSystem was found.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-18 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.1.patch

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-18 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771276#comment-13771276
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris and Sandy. I made a draft patch for the proposal. I am thinking we 
still pass tempDirFs into PathDeletionContext instead of passing fs, in 
order to deal with the case that fs is closed by someone. Although tempDirFs 
might be different from fs due to the different subject problem discussed 
above, in most of cases they would be the same (I used userUGI to get 
tempDirFs). So this is still an optimization. Let me know your comments. Thanks.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-15 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767702#comment-13767702
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks [~sandyr] and [~cnauroth]. Actually, the above discussion made me have 
second thoughts on the patch attached. There is a race condition here. Supposed 
that Path#getFileSystem in CleanupQueue#deletePath retrieved the same instance 
of JobInProgress#fs from FileSystem#Cache as well. Because there is race 
condition between DistributedFileSystem#close() and FileSystem#close(), it is 
possible that at the most just after JobInProgress#cleanupJob closed 
JobInProgress#fs's DFSClient, the processor switched to CleanupQueue#deletePath 
and called fs.delete(). Because this fs's DFCClient has been closed, an 
exception would be thrown and this staging directory won't be deleted then.



 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-15 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767847#comment-13767847
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris for filing HDFS-5211. That sounds good to me:)

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-14 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767401#comment-13767401
 ] 

Xi Fang commented on MAPREDUCE-5508:


[~sandyr] Thanks for your comments.

bq. Have you tested this fix.

Yes. We have tested this fix on our test cluster (about 130,000 submission). 
After the workflow was done, we waited for a couple of minutes (jobs were 
retiring), then forced GC, and then dumped the memory. We manually checked the 
FileSystem#Cache. There was no memory leak.

bq. For your analysis 

1. I agree with it doesn't appear that tempDirFs and fs are ever even ending 
up equal because tempDirFs is created with the wrong UGI.  
2. I think tempDir would be fine because  1) JobInProgess#cleanupJob won't 
introduce a file system instance for tempDir and 2) the fs in 
CleanupQueue@deletePath would be reused (i.e. only one instance would exist in 
FileSystem#Cache). My initial thought was this part has a memory leak. But a 
test shows that there is no problem here.
3. The problem is actually 
{code}
tempDirFs = jobTempDirPath.getFileSystem(conf);
{code}
The problem here is that this guy MAY (I will explain later) put a new entry 
in FileSystem#Cache. Note that this would eventually go into 
UserGroupInformation#getCurrentUser to get a UGI with a current 
AccessControlContext.  CleanupQueue#deletePath won't close this entry because a 
different UGI (i.e. userUGI created in JobInProgress) is used there. Here is 
the tricky part which we had a long discussion with [~cnauroth] and [~vinodkv]. 
The problem here is that although we may only have one current user, the 
following code MAY return different subjects.
{code}
 static UserGroupInformation getCurrentUser() throws IOException {
AccessControlContext context = AccessController.getContext();
--Subject subject = Subject.getSubject(context);   
- 
{code}
Because the entry of FileSystem#Cache uses identityHashCode of a subject to 
construct the key, a file system object created by  
jobTempDirPath.getFileSystem(conf) may not be found later when this code is 
executed again, although we may have the same principle (i.e. the current 
user). This eventually leads to an unbounded number of file system instances in 
FileSystem#Cache. Nothing is going to remove them from the cache.
 
Please let me know if you have any questions. 

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-14 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767402#comment-13767402
 ] 

Xi Fang commented on MAPREDUCE-5508:


Just found Chris was also working on this thread :). I agree with Chris. 
Changing the hash code may have a wide impact on existing code that would be 
risky.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-14 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767525#comment-13767525
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Sandy for the information on HADOOP-6670. I think we may still need to 
close fs anyway, because p.getFileSystem(conf) in CleanupQueue#deletePath may 
not be able to find the FileSystem#Cache entry of JobInProgress#fs because of 
the different subject problem we discussed above. In this case, nothing will 
remove JobInProgress#fs from the FileSystem#Cache.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5508) Memory Leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

Xi Fang created MAPREDUCE-5508:
--

 Summary: Memory Leak caused by unreleased FileSystem objects in 
JobInProgress#cleanupJob
 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical


MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object that is properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Summary: Memory leak caused by unreleased FileSystem objects in 
JobInProgress#cleanupJob  (was: Memory Leak caused by unreleased FileSystem 
objects in JobInProgress#cleanupJob)

 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object that is properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767210#comment-13767210
 ] 

Xi Fang commented on MAPREDUCE-5508:


This bug was found in Microsoft's large scale test with about 200,000 job 
submissions. The memory usage is steadily growing up. 

There is a long discussion between Hortonworks (thanks [~cnauroth] and 
[~vinodkv]) and Microsoft on this issue. Here is the summary of the discussion.

1. The heap dumps are showing DistributedFileSystem instances that are only 
referred to from the cache's HashMap entries. Since nothing else has a 
reference, nothing else can ever attempt to close it, and therefore it will 
never be removed from the cache. 

2. The special check for tempDirFS (see code in description) in the patch for 
MAPREDUCE-5351 is intended as an optimization so that CleanupQueue doesn't need 
to immediately reopen a FileSystem that was just closed. However, we observed 
that we're getting different identity hash code values on the subject in the 
key. The code is assuming that CleanupQueue will find the same Subject that was 
used inside JobInProgress. Unfortunately, this is not guaranteed, because we 
may have crossed into a different access control context at this point, via 
UserGroupInformation#doAs. Even though it's conceptually the same user, the 
Subject is a function of the current AccessControlContext:
{code}
  public synchronized
  static UserGroupInformation getCurrentUser() throws IOException {
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
{code}
Even if the contexts are logically equivalent between JobInProgress and 
CleanupQueue, we see no guarantee that Java will give you the same Subject 
instance, which is required for successful lookup in the FileSystem cache 
(because of the use of identity hash code).

A fix is abandon this optimization and close the FileSystem within the same 
AccessControlContext that opened it.  


 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object that is properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Description: 
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object (see tempDirFs) that is not properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}


  was:
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object that is properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}



 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Description: 
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object (see tempDirFs) that is not properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
 if (tempDirFs != fs) {
  try {
fs.close();
  } catch (IOException ie) {
...
}
{code}


  was:
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object (see tempDirFs) that is not properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}



 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.patch

 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-5508 started by Xi Fang.

 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Summary: JobTracker memory leak caused by unreleased FileSystem objects in 
JobInProgress#cleanupJob  (was: Memory leak caused by unreleased FileSystem 
objects in JobInProgress#cleanupJob)

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5405) Job recovery can fail if task log directory symlink from prior run still exists

2013-07-19 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13714194#comment-13714194
 ] 

Xi Fang commented on MAPREDUCE-5405:


Sounds good to me! I also did some tests on Ubuntu and Windows. It passes 
consistently. Thanks Chris. 

 Job recovery can fail if task log directory symlink from prior run still 
 exists
 ---

 Key: MAPREDUCE-5405
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5405
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1-win, 1.3.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: MAPREDUCE-5405.branch-1.1.patch


 During recovery, the task attempt log dir symlink from the prior run might 
 still exist.  If it does, then the recovered attempt will fail while trying 
 to create a symlink at that path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5391) TestNonLocalJobJarSubmission fails on Windows due to missing classpath entries

2013-07-15 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708988#comment-13708988
 ] 

Xi Fang commented on MAPREDUCE-5391:


Thanks Chris. The patch looks good to me!

 TestNonLocalJobJarSubmission fails on Windows due to missing classpath entries
 --

 Key: MAPREDUCE-5391
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5391
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: MAPREDUCE-5391.1.patch


 This test works by having the mapper check all classpath entries loaded by 
 the classloader.  On Windows, the classpath is packed into an intermediate 
 jar file with a manifest containing the classpath to work around command line 
 length limitation.  The test needs to be updated to unpack the intermediate 
 jar file and read the manifest when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-07-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5278:
---

Attachment: MAPREDUCE-5278.5.patch

 Distributed cache is broken when JT staging dir is not on the default FS
 

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch, 
 MAPREDUCE-5278.4.patch, MAPREDUCE-5278.5.patch, MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-07-09 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703619#comment-13703619
]

Xi Fang commented on MAPREDUCE-5278:

Thanks, Chris. A new patch has been attached.

Distributed cache is broken when JT staging dir is not on the default FS

Key: MAPREDUCE-5278
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: distributed-cache
Affects Versions: 1-win
Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Fix For: 1-win

Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch,
MAPREDUCE-5278.4.patch, MAPREDUCE-5278.5.patch, MAPREDUCE-5278.patch

Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is
set to point to HDFS, even though other file systems (e.g. Amazon S3 file
system and Windows ASV file system) are the default file systems.
For ASV, this config was chosen and there are a few reasons why:
1. To prevent leak of the storage account credentials to the user's storage
account;
2. It uses HDFS for the transient job files what is good for two reasons – a)
it does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files
However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.
When files are added to the distributed cache (thru files/achieves/libjars
hadoop generic options), they are copied to the job tracker staging dir only
if they reside on a file system different that the jobtracker's. Later on,
this path is used as a key to cache the files locally on the tasktracker's
machine, and avoid localization (download/unzip) of the distributed cache
files if they are already localized.
In this configuration the caching is completely disabled and we always end up
copying dist cache files to the job tracker's staging dir first and
localizing them on the task tracker machine second.
This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-07-09 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703695#comment-13703695
 ] 

Xi Fang commented on MAPREDUCE-5278:


Thanks, Chris

 Distributed cache is broken when JT staging dir is not on the default FS
 

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch, 
 MAPREDUCE-5278.4.patch, MAPREDUCE-5278.5.patch, MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users

2013-07-08 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702176#comment-13702176
 ] 

Xi Fang commented on MAPREDUCE-5371:


Thanks Chris!

 TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of 
 windows users
 ---

 Key: MAPREDUCE-5371
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5371.patch


 The error message was:
 Error Message
 expected:[sijenkins-vm2]jenkins but was:[]jenkins
 Stacktrace
 at 
 org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45)
 The root cause of this failure is the domain used on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-07-03 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5278:
---

Attachment: MAPREDUCE-5278.4.patch

 Distributed cache is broken when JT staging dir is not on the default FS
 

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch, 
 MAPREDUCE-5278.4.patch, MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-07-03 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699544#comment-13699544
]

Xi Fang commented on MAPREDUCE-5278:

Thanks Bikas. A new patch was attached.

Distributed cache is broken when JT staging dir is not on the default FS

Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch,
MAPREDUCE-5278.4.patch, MAPREDUCE-5278.patch

[jira] [Commented] (MAPREDUCE-5330) JVM manager should not forcefully kill the process on Signal.TERM on Windows

2013-07-02 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698205#comment-13698205
 ] 

Xi Fang commented on MAPREDUCE-5330:


Thanks Ivan and Chris!

 JVM manager should not forcefully kill the process on Signal.TERM on Windows
 

 Key: MAPREDUCE-5330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5330.patch


 In MapReduce, we sometimes kill a task's JVM before it naturally shuts down 
 if we want to launch other tasks (look in 
 JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map 
 task process is in the middle of doing some cleanup/finalization after the 
 task is done, it might be interrupted/killed without giving it a chance. 
 In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during 
 closing file systems in a special shutdown hook, we're typically uploading 
 storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if 
 this kill happens these metrics get lost. The impact is that for many MR jobs 
 we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users

2013-07-02 Thread Xi Fang (JIRA)

Xi Fang created MAPREDUCE-5371:
--

 Summary: TestProxyUserFromEnv#testProxyUserFromEnvironment failed 
caused by domains of windows users
 Key: MAPREDUCE-5371
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win


The error message was:
Error Message
expected:[sijenkins-vm2]jenkins but was:[]jenkins
Stacktrace
at 
org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45)

The root cause of this failure is the domain used on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users

2013-07-02 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5371:
---

Attachment: MAPREDUCE-5371.patch

 TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of 
 windows users
 ---

 Key: MAPREDUCE-5371
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5371.patch


 The error message was:
 Error Message
 expected:[sijenkins-vm2]jenkins but was:[]jenkins
 Stacktrace
 at 
 org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45)
 The root cause of this failure is the domain used on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users

2013-07-02 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-5371 started by Xi Fang.

 TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of 
 windows users
 ---

 Key: MAPREDUCE-5371
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5371.patch


 The error message was:
 Error Message
 expected:[sijenkins-vm2]jenkins but was:[]jenkins
 Stacktrace
 at 
 org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45)
 The root cause of this failure is the domain used on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users

2013-07-02 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698240#comment-13698240
 ] 

Xi Fang commented on MAPREDUCE-5371:


The attached patch removed the domains from user names.

 TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of 
 windows users
 ---

 Key: MAPREDUCE-5371
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5371.patch


 The error message was:
 Error Message
 expected:[sijenkins-vm2]jenkins but was:[]jenkins
 Stacktrace
 at 
 org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45)
 The root cause of this failure is the domain used on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5109) Job view-acl should apply to job listing too

2013-06-27 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695116#comment-13695116
 ] 

Xi Fang commented on MAPREDUCE-5109:


Hi Vinod, thanks for your patch. If Hadoop runs with this patch on Windows, 
there would be a problem because file name can't have * on Windows. After 
discussed Chris, we have two proposals specifically for Windows:

1. Use an entirely different wildcard character on Windows (for example: using 
! instead of *)
2. Add an encoder and a decoder specifically for * in 
JobHistory#encodeJobHistoryFileName() and decodeJobHistoryFileName() 
respectively, on Windows. For example, we can encode * to %20F. In this 
case, getNewJobHistoryFileName should also be changed accordingly. 

Do you have any suggestion on these two options?

 Job view-acl should apply to job listing too
 

 Key: MAPREDUCE-5109
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5109
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Vinod Kumar Vavilapalli
 Attachments: MAPREDUCE-5109-20130405.2.txt


 Job view-acl should apply to job listing too, currently it only applies to 
 job details pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded

2013-06-18 Thread Xi Fang (JIRA)

Xi Fang created MAPREDUCE-5330:
--

 Summary: Killing M/R JVM's leads to metrics not being uploaded
 Key: MAPREDUCE-5330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang


In MapReduce, we sometimes kill a task's JVM before it naturally shuts down if 
we want to launch other tasks (look in JvmManager$JvmManagerForType.reapJvm). 
This behavior means that if the map task process is in the middle of doing some 
cleanup/finalization after the task is done, it might be interrupted/killed 
without giving it a chance. 

In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during 
closing file systems in a special shutdown hook, we're typically uploading 
storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if 
this kill happens these metrics get lost. The impact is that for many MR jobs 
we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded

2013-06-18 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5330:
---

Attachment: MAPREDUCE-5330.patch

 Killing M/R JVM's leads to metrics not being uploaded
 -

 Key: MAPREDUCE-5330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Attachments: MAPREDUCE-5330.patch


 In MapReduce, we sometimes kill a task's JVM before it naturally shuts down 
 if we want to launch other tasks (look in 
 JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map 
 task process is in the middle of doing some cleanup/finalization after the 
 task is done, it might be interrupted/killed without giving it a chance. 
 In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during 
 closing file systems in a special shutdown hook, we're typically uploading 
 storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if 
 this kill happens these metrics get lost. The impact is that for many MR jobs 
 we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded

2013-06-18 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687346#comment-13687346
 ] 

Xi Fang commented on MAPREDUCE-5330:


If Signal.TERM is sent to a process, then we wait for a delay. But in Windows 
the signal kind is ignored - we just kill it (look at 
Shell#getSignalKillProcessGroupCommand())
{code}
  public static String[] getSignalKillProcessGroupCommand(int code,
  String groupId) {
if (WINDOWS) {
  return new String[] { Shell.WINUTILS, task, kill, groupId };
} else {
  return new String[] { kill, - + code , - + groupId };
}
  }
{code}

Here is a fix. If the OS is Windows and the signal is TERM, then return 
immediately and let a delayed process killer actually kill this process group. 
This can give this process group a graceful time to clean up itself.

 Killing M/R JVM's leads to metrics not being uploaded
 -

 Key: MAPREDUCE-5330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Attachments: MAPREDUCE-5330.patch


 In MapReduce, we sometimes kill a task's JVM before it naturally shuts down 
 if we want to launch other tasks (look in 
 JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map 
 task process is in the middle of doing some cleanup/finalization after the 
 task is done, it might be interrupted/killed without giving it a chance. 
 In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during 
 closing file systems in a special shutdown hook, we're typically uploading 
 storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if 
 this kill happens these metrics get lost. The impact is that for many MR jobs 
 we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-06-17 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686060#comment-13686060
]

Xi Fang commented on MAPREDUCE-5278:

Thanks Bikas for your comments. For your question : Is the following code
(marked below) continuing to copy stuff to the default fs (fs) when the newPath
points to a different filesystem?:

No. Basically, the original code does this: If JT staging dir is not on the
default FS (for example, in our context it is ASV), copyRemoteFiles() will copy
files in ASV to JT. Note that these files are specified using generic options.
After our change, when ASV is marked as accessible by specifying
mapreduce.client.accessible.remote.schemes, copyRemoteFiles() won't copy the
files in ASV to the jobtracker. It just directly returns the path of that
file, denoted by newPath. In addition, no copy operation would happen in
addArchiveToClassPath().

Distributed cache is broken when JT staging dir is not on the default FS

Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.patch

[jira] [Updated] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-06-17 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5278:
---

Attachment: MAPREDUCE-5278.3.patch

 Distributed cache is broken when JT staging dir is not on the default FS
 

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch, 
 MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-06-17 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686233#comment-13686233
 ] 

Xi Fang commented on MAPREDUCE-5278:


Thanks Bikas. A config name was added in JobClient.java
{code}
private static final String CLIENT_ACCESSIBLE_REMOTE_SCHEMES_KEY =
   mapreduce.client.accessible.remote.schemes;
{code}
And in copyRemoteFiles(), I changed to
{code}
String [] accessibleSchemes = job.getStrings(
CLIENT_ACCESSIBLE_REMOTE_SCHEMES_KEY, null);
{code}

 Distributed cache is broken when JT staging dir is not on the default FS
 

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.3.patch, 
 MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-06-10 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5278:
---

Attachment: MAPREDUCE-5278.2.patch

 Distributed cache is broken when JT staging dir is not on the default FS
 

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Distributed cache is broken when JT staging dir is not on the default FS

2013-06-10 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680075#comment-13680075
]

Xi Fang commented on MAPREDUCE-5278:

Thanks Ivan. I have added a classpath check and am preparing a trunk version.

Distributed cache is broken when JT staging dir is not on the default FS

Attachments: MAPREDUCE-5278.2.patch, MAPREDUCE-5278.patch

[jira] [Work started] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-06-05 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on MAPREDUCE-5278 started by Xi Fang.

Perf: Distributed cache is broken when JT staging dir is not on the default FS
--

[jira] [Updated] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-06-05 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xi Fang updated MAPREDUCE-5278:
---

Attachment: MAPREDUCE-5278.patch

A patch is attached.

In this patch, we added a property called
mapreduce.client.accessible.remote.schemes. It specifies the schemes of the
file systems that are accessible from all the nodes in the cluster. This is
used by the job client to avoid copying distributed cache entries to the job
staging dir if path is accessible (See JobClient#copyRemoteFiles() ).

For example, on Windows Azure, a path that has ASV as its scheme is accessible
from all the nodes in the cluster. mapreduce.client.accessible.remote.schemes
can be set to ASV.

The change in this patch is passive, meaning that it won’t take effect unless
this property is enabled thru configuration.

Perf: Distributed cache is broken when JT staging dir is not on the default FS
--

[jira] [Updated] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-06-05 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5278:
---

   Fix Version/s: 1-win
Target Version/s: 1-win
  Status: Patch Available  (was: In Progress)

 Perf: Distributed cache is broken when JT staging dir is not on the default FS
 --

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang
 Fix For: 1-win

 Attachments: MAPREDUCE-5278.patch


 Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is 
 set to point to HDFS, even though other file systems (e.g. Amazon S3 file 
 system and Windows ASV file system) are the default file systems.
 For ASV, this config was chosen and there are a few reasons why:
 1. To prevent leak of the storage account credentials to the user's storage 
 account; 
 2. It uses HDFS for the transient job files what is good for two reasons – a) 
 it does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In this configuration the caching is completely disabled and we always end up 
 copying dist cache files to the job tracker's staging dir first and 
 localizing them on the task tracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)

Xi Fang created MAPREDUCE-5278:
--

 Summary: Perf: Distributed cache is broken when JT staging dir is 
not on the default FS
 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang


Today, we set the JobTracker staging dir 
(mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is 
the default file system. There are a few reason why this config was chosen:
To prevent leak of the storage account creds to the user's storage account 
(IOW, keep job.xml in the cluster). This is needed until HADOOP-444 is fixed.
It uses HDFS for the transient job files what is good for two reasons – a) it 
does not flood the user's storage account with irrelevant data/files b) it 
leverages HDFS locality for small files
However, this approach conflicts with how distributed cache caching works, 
completely negating the feature's functionality.
When files are added to the distributed cache (thru files/achieves/libjars 
hadoop generic options), they are copied to the job tracker staging dir only if 
they reside on a file system different that the jobtracker's. Later on, this 
path is used as a key to cache the files locally on the tasktracker's 
machine, and avoid localization (download/unzip) of the distributed cache files 
if they are already localized.
In our configuration the caching is completely disabled and we always end up 
copying dist cache files to the JT staging dir first and localizing them on the 
tasktracker machine second.
This is especially not good for Oozie scenarios as Oozie uses dist cache to 
populate Hive/Pig jars throughout the cluster.
Easy workaround is to config mapreduce.jobtracker.staging.root.dir in 
mapred-site.xml to be on the default FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5278:
---

Assignee: Xi Fang

 Perf: Distributed cache is broken when JT staging dir is not on the default FS
 --

 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang
Assignee: Xi Fang

 Today, we set the JobTracker staging dir 
 (mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is 
 the default file system. There are a few reason why this config was chosen:
 To prevent leak of the storage account creds to the user's storage account 
 (IOW, keep job.xml in the cluster). This is needed until HADOOP-444 is fixed.
 It uses HDFS for the transient job files what is good for two reasons – a) it 
 does not flood the user's storage account with irrelevant data/files b) it 
 leverages HDFS locality for small files
 However, this approach conflicts with how distributed cache caching works, 
 completely negating the feature's functionality.
 When files are added to the distributed cache (thru files/achieves/libjars 
 hadoop generic options), they are copied to the job tracker staging dir only 
 if they reside on a file system different that the jobtracker's. Later on, 
 this path is used as a key to cache the files locally on the tasktracker's 
 machine, and avoid localization (download/unzip) of the distributed cache 
 files if they are already localized.
 In our configuration the caching is completely disabled and we always end up 
 copying dist cache files to the JT staging dir first and localizing them on 
 the tasktracker machine second.
 This is especially not good for Oozie scenarios as Oozie uses dist cache to 
 populate Hive/Pig jars throughout the cluster.
 Easy workaround is to config mapreduce.jobtracker.staging.root.dir in 
 mapred-site.xml to be on the default FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xi Fang updated MAPREDUCE-5278:
---

Description:
Today, we set the JobTracker staging dir
(mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is
the default file system. There are a few reason why this config was chosen:

1. To prevent leak of the storage account creds to the user's storage account
(IOW, keep job.xml in the cluster).
2. It uses HDFS for the transient job files what is good for two reasons – a)
it does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files

However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.

When files are added to the distributed cache (thru files/achieves/libjars
hadoop generic options), they are copied to the job tracker staging dir only if
they reside on a file system different that the jobtracker's. Later on, this
path is used as a key to cache the files locally on the tasktracker's
machine, and avoid localization (download/unzip) of the distributed cache files
if they are already localized.

In our configuration the caching is completely disabled and we always end up
copying dist cache files to the JT staging dir first and localizing them on the
tasktracker machine second.

This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.

Easy workaround is to config mapreduce.jobtracker.staging.root.dir in
mapred-site.xml to be on the default FS.

was:
Today, we set the JobTracker staging dir
(mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is
the default file system. There are a few reason why this config was chosen:
To prevent leak of the storage account creds to the user's storage account
(IOW, keep job.xml in the cluster). This is needed until HADOOP-444 is fixed.
It uses HDFS for the transient job files what is good for two reasons – a) it
does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files
However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.
When files are added to the distributed cache (thru files/achieves/libjars
hadoop generic options), they are copied to the job tracker staging dir only if
they reside on a file system different that the jobtracker's. Later on, this
path is used as a key to cache the files locally on the tasktracker's
machine, and avoid localization (download/unzip) of the distributed cache files
if they are already localized.
In our configuration the caching is completely disabled and we always end up
copying dist cache files to the JT staging dir first and localizing them on the
tasktracker machine second.
This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.
Easy workaround is to config mapreduce.jobtracker.staging.root.dir in
mapred-site.xml to be on the default FS.

Perf: Distributed cache is broken when JT staging dir is not on the default FS
--

Today, we set the JobTracker staging dir
(mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is
the default file system. There are a few reason why this config was chosen:
1. To prevent leak of the storage account creds to the user's storage account
(IOW, keep job.xml in the cluster).
2. It uses HDFS for the transient job files what is good for two reasons – a)
it does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files
However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.
When files are added to the distributed cache (thru files/achieves/libjars
hadoop generic options), they are copied to the job tracker staging dir only
if they reside on a file system different that the jobtracker's. Later on,
this path is used as a key to cache the files locally on the tasktracker's
machine, and avoid localization (download/unzip) of the distributed cache
files if they are already localized.
In our configuration the caching is completely disabled and we always end up
copying dist cache files to the JT staging dir first and localizing them on
the tasktracker machine second.
This is especially

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-28 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668119#comment-13668119
 ] 

Xi Fang commented on MAPREDUCE-5224:


Thanks Ivan!

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668117#comment-13668117
]

Xi Fang commented on MAPREDUCE-5278:

Basically, if a remote file system is reachable from task trackers, we don't
have to copy the files on this file system to the job tracker's staging (see
JobClient#copyRemoteFiles() ).

For example, in HDInsight, user storage would be ASV which is different than
HDFS. So by default these files would be copied to JT. However, since ASV is
supposed to be reachable from tasktracker, these copy operations would be
unnecessary, which will also disable the dist cache. A proposal is to add a
configuration property (e.g. mapred.tasktracker.scheme.accessible). If we
specify a scheme in this property, we won't do the copy operation even if this
scheme is not equal to the scheme of job tracker's staging dir. For example, in
this context, mapred.tasktracker.scheme.accessible=ASV.

Perf: Distributed cache is broken when JT staging dir is not on the default FS
--

[jira] [Updated] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xi Fang updated MAPREDUCE-5278:
---

Description:
Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is
set to point to HDFS, even though other file systems (e.g. Amazon S3 file
system and Windows ASV file system) are the default file systems. For ASV, this
config was chosen and there are a few reasons why:

1. To prevent leak of the storage account creds to the user's storage account;
2. It uses HDFS for the transient job files what is good for two reasons – a)
it does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files

However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.

In this configuration the caching is completely disabled and we always end up
copying dist cache files to the job tracker's staging dir first and localizing
them on the task tracker machine second.

This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.

was:
Today, we set the JobTracker staging dir
(mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is
the default file system. There are a few reason why this config was chosen:

However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.

In our configuration the caching is completely disabled and we always end up
copying dist cache files to the JT staging dir first and localizing them on the
tasktracker machine second.

This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.

Easy workaround is to config mapreduce.jobtracker.staging.root.dir in
mapred-site.xml to be on the default FS.

Perf: Distributed cache is broken when JT staging dir is not on the default FS
--

Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is
set to point to HDFS, even though other file systems (e.g. Amazon S3 file
system and Windows ASV file system) are the default file systems. For ASV,
this config was chosen and there are a few reasons why:
1. To prevent leak of the storage account creds to the user's storage
account;
2. It uses HDFS for the transient job files what is good for two reasons – a)
it does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files
However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.
When files are added to the distributed cache (thru files/achieves/libjars
hadoop generic options), they are copied to the job tracker staging dir only
if they reside on a file system different that the jobtracker's. Later on,
this path is used as a key to cache the files locally on the tasktracker's
machine, and avoid localization (download/unzip) of the distributed cache
files if they are already localized.
In this configuration the caching is completely disabled and we always end up
copying dist cache files to the job tracker's staging dir first and
localizing them on the task tracker machine second.
This is especially not

[jira] [Updated] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xi Fang updated MAPREDUCE-5278:
---

For ASV, this config was chosen and there are a few reasons why:

1. To prevent leak of the storage account credentials to the user's storage
account;
2. It uses HDFS for the transient job files what is good for two reasons – a)
it does not flood the user's storage account with irrelevant data/files b) it
leverages HDFS locality for small files

However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.

In this configuration the caching is completely disabled and we always end up
copying dist cache files to the job tracker's staging dir first and localizing
them on the task tracker machine second.

This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.

was:
Today, the JobTracker staging dir (mapreduce.jobtracker.staging.root.dir) is
set to point to HDFS, even though other file systems (e.g. Amazon S3 file
system and Windows ASV file system) are the default file systems. For ASV, this
config was chosen and there are a few reasons why:

However, this approach conflicts with how distributed cache caching works,
completely negating the feature's functionality.

In this configuration the caching is completely disabled and we always end up
copying dist cache files to the job tracker's staging dir first and localizing
them on the task tracker machine second.

This is especially not good for Oozie scenarios as Oozie uses dist cache to
populate Hive/Pig jars throughout the cluster.

Perf: Distributed cache is broken when JT staging dir is not on the default FS
--

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-26 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.5.patch

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-26 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667437#comment-13667437
 ] 

Xi Fang commented on MAPREDUCE-5224:


Thank Ivan for MAPREDUCE-5224.5.patch. 
Here is the reason (offline emails from Ivan) for posting this new patch.

1. Given that fs is indeed used on some other places, we have to account for 
that as well (these tests actually want to close the system dir fs). 
2. There is no need to use the default file system for the jobhistory. There is 
another (orthogonal) bug here. Job history completed location also assumes the 
default FS what is not correct. This should be a separate Jira. 
3. This would make the prod code change really simple.



 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.4.patch, MAPREDUCE-5224.5.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-23 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665679#comment-13665679
 ] 

Xi Fang commented on MAPREDUCE-5224:


Thanks Ivan for your detailed comments. These are of great help!

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-23 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.4.patch

Above comments have been addressed. Thanks.

BTW, I changed JobTracker#defaultFs back to fs, because some other codes in the 
same package use this fs (fs was originally defined with no access modifier).

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.4.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-21 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.3.patch

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-21 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663457#comment-13663457
 ] 

Xi Fang commented on MAPREDUCE-5224:


Thanks Chuan and Ivan.
1. For Chuan's comment: I added an assert to check this dir exists indeed. This 
also addresses Ivan's 10th comment.
2. For Ivan's comment: 
a. For comments #1, 3, 5, 7, 9, 10, I just followed the comments.
b. For comment #2: I personally think it would be better if we can throw out an 
exception rather than swallowing it and setting it back to default file system. 
As mentioned by Mostafa (offline),  for example, if someone configured the 
system dir as http://www.awesome.com/system, then with the fallback solution 
the exception saying HTTP is not supported will be swallowed and we'll set 
the system directory as just /system in the default file system, which doesn't 
seem like good behavior. We may want someone explicitly know/handle this at the 
moment  this happens.
c. For comment #4: I renamed the JobTracker#fs to defaultFs and still keep it 
just for possible future use/reference of this variable. 
d. For comment #6, 8: I put the initialization of MiniDFSCluster, MiniMRCluster 
in the test case and let setUp() just construct a configuration. In this way, 
we don't have to throw IOException in setUp() and test case would fail if my 
code changes are not applied.


 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.3.patch, 
 MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-20 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662398#comment-13662398
 ] 

Xi Fang commented on MAPREDUCE-5224:


Hi Ivan, I was addressing your fourth comment. I have one question.
There are two methods:
- 
 /**
   * Grab the local fs name
   */
  public synchronized String getFilesystemName() throws IOException {
if (fs == null) {
  throw new IllegalStateException(FileSystem object not available yet);
}
return fs.getUri().toString();
  }
 
-
 
  
/**
   * Get JobTracker's FileSystem. This is the filesystem for mapred.system.dir.
   */
  FileSystem getFileSystem() {
return fs;
  }

I am a little bit confused. I think for getFileSystem() it is clear. We still 
return the systemDir's file system, so we should change this fs to systemDirFs 
which I omitted in my previous patch.

For getFilesystemName(), what does fs stand for in this context, default fs or 
systemDir's file system. I guess it denotes the latter one. Right?

Thanks


 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-20 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662400#comment-13662400
 ] 

Xi Fang commented on MAPREDUCE-5224:


Sorry for the format! The system changed my text to something else because of 
the special symbols. 

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-18 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661415#comment-13661415
 ] 

Xi Fang commented on MAPREDUCE-5224:


Thanks Ivan for the comments. That is of great help! I will check my code:)

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-17 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661206#comment-13661206
 ] 

Xi Fang commented on MAPREDUCE-5224:


Thanks Chuan.

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-13 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.2.patch

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-13 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656644#comment-13656644
 ] 

Xi Fang commented on MAPREDUCE-5224:


I updated the patch. It includes the unit test. I also made some change to 
MAPREDUCE-5224.patch because the previous one is not complete. In many places, 
the default file system is used to access the system directory rather than only 
in getSystemDir(). Thus, this requires much more changes to the original code. 

I still have some questions I am not quite sure.
1. In the constructor of JobTracker:
try {
FileStatus systemDirStatus = systemDirFs.getFileStatus(systemDir);
if (!systemDirStatus.isOwnedByUser(
mrOwner.getShortUserName(), mrOwner.getGroupNames()))
{ throw new AccessControlException(The systemdir  + systemDir +  is not 
owned by  + mrOwner.getShortUserName()); }
if (!systemDirStatus.getPermission().equals(SYSTEM_DIR_PERMISSION))
{ LOG.warn(Incorrect permissions on  + systemDir + . Setting it to  + 
SYSTEM_DIR_PERMISSION); systemDirFs.setPermission(systemDir,new 
FsPermission(SYSTEM_DIR_PERMISSION)); }
} 
Basically, I have changed the file system used to access the system dir. But I 
am not quite sure if I should change the two IF statements, because the file 
permission might be a problem.
2. LocalJobRunner has a method getSystemDir() as well. It uses the default file 
system to access the system directory.
public String getSystemDir()
{ Path sysDir = new Path(conf.get(mapred.system.dir, 
/tmp/hadoop/mapred/system)); return fs.makeQualified(sysDir).toString(); }
I am not quite sure if I need to change this as well.
Thanks!

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.2.patch, MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.patch

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: (was: MAPREDUCE-5224.patch)

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.1.patch

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.1.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: (was: MAPREDUCE-5224.1.patch)

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-5224 started by Xi Fang.

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Attachment: MAPREDUCE-5224.patch

 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-09 Thread Xi Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653473#comment-13653473
 ] 

Xi Fang commented on MAPREDUCE-5224:


The original motivation of this JIRA is trying to fix the following scenario. 
In Azure, the default file system is set as ASV (Windows Azure Blob Storage), 
but we would still like the system directory to be in DFS, because we don't 
want to put such files in ASV that charge Azure customers fee. Thus, we want to 
change JobTracker.java to allow that.

The problem in the current JobTracker.java is that we want to use 
makeQualified() to assemble a path. But getSystemDir() uses the wrong fs object 
to call fs.makeQualified(), if default (e.g. Azure in our scanerio) and 
mapred.system.dir are using different file systems. In the proposed fix, we 
rely on FileSystem.get() to choose the appropriate file system according to 
mapred.system.dir. It falls back on the default file system if the scheme is 
not there. 

Although the original motivation is trying to fix the problem for Azure, this 
fix also applies to other scenarios where the default file system and 
mapred.system.dir are supposed to use different file systems.

A unit test will follow.


 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win

 Attachments: MAPREDUCE-5224.patch


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-08 Thread Xi Fang (JIRA)

Xi Fang created MAPREDUCE-5224:
--

 Summary: JobTracker should allow the system directory to be in 
non-default FS
 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win


JobTracker today expects the system directory to be in the default file system
if (fs == null) {
  fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
public FileSystem run() throws IOException {
  return FileSystem.get(conf);
  }});
}


...

  public String getSystemDir() {
Path sysDir = new Path(conf.get(mapred.system.dir, 
/tmp/hadoop/mapred/system));  
return fs.makeQualified(sysDir).toString();
  }
In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
Storage), but we would still like the system directory to be in DFS. We should 
change JobTracker to allow that.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5224) JobTracker should allow the system directory to be in non-default FS

2013-05-08 Thread Xi Fang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5224:
---

Description: 
 JobTracker today expects the system directory to be in the default file system
if (fs == null) {
  fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
public FileSystem run() throws IOException {
  return FileSystem.get(conf);
  }});
}


...

  public String getSystemDir() {
Path sysDir = new Path(conf.get(mapred.system.dir, 
/tmp/hadoop/mapred/system));  
return fs.makeQualified(sysDir).toString();
  }
In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
Storage), but we would still like the system directory to be in DFS. We should 
change JobTracker to allow that.


  was:
JobTracker today expects the system directory to be in the default file system
if (fs == null) {
  fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
public FileSystem run() throws IOException {
  return FileSystem.get(conf);
  }});
}


...

  public String getSystemDir() {
Path sysDir = new Path(conf.get(mapred.system.dir, 
/tmp/hadoop/mapred/system));  
return fs.makeQualified(sysDir).toString();
  }
In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
Storage), but we would still like the system directory to be in DFS. We should 
change JobTracker to allow that.



 JobTracker should allow the system directory to be in non-default FS
 

 Key: MAPREDUCE-5224
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5224
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Minor
 Fix For: 1-win


  JobTracker today expects the system directory to be in the default file 
 system
 if (fs == null) {
   fs = mrOwner.doAs(new PrivilegedExceptionActionFileSystem() {
 public FileSystem run() throws IOException {
   return FileSystem.get(conf);
   }});
 }
 ...
   public String getSystemDir() {
 Path sysDir = new Path(conf.get(mapred.system.dir, 
 /tmp/hadoop/mapred/system));  
 return fs.makeQualified(sysDir).toString();
   }
 In Cloud like Azure the default file system is set as ASV (Windows Azure Blob 
 Storage), but we would still like the system directory to be in DFS. We 
 should change JobTracker to allow that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

77 matches

Mail list logo