[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Junping Du updated MAPREDUCE-6478: ---------------------------------- Attachment: MAPREDUCE-6478-v1.patch Put a quick patch to add two configurations to allow skip cleanupJob or ignore cleanupJob failures. This is quite straightforward, so unit test is unnecessary here. > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > ------------------------------------------------------------------------------------ > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Junping Du > Assignee: Junping Du > Attachments: MAPREDUCE-6478-v1.patch > > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)