Bimalendu Choudhary created MAPREDUCE-7331:
----------------------------------------------
Summary: Make temporary directory used by FileOutputCommitter
configurable
Key: MAPREDUCE-7331
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7331
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 3.0.0
Environment: CDH 6.2.1 Hadoop 3.0.0
Reporter: Bimalendu Choudhary
Spark SQL applications uses FileOutputCommitter to commit and merge its files
under a table directory. The hardcoded PENDING_DIR_NAME = _temporary directory
results in multiple application using the same temporary directory. This casues
unwanted results of one application interfering with other applications
temporary files. Also one application ending up deleting temporary files of
other. There is no way right now for applications to have there unique path to
store the temporary files to avoid any interference from other totally
independent applications. I think the temporary directory being used by
FileOutputCommitter should be made configurable to let the caller call with
with its own unique value as per the requirement and avoid it getting deleted
or overwritten by other applications
Something like:
{quote}public static final String PENDING_DIR_NAME_DEFAULT = "_temporary";
public static final String PENDING_DIR_NAME_DEFAULT =
"mapreduce.fileoutputcommitter.tempdir";
{quote}
This can be used very efficiently by Spark applications to handle even stage
failures where temporary directories from previous attempts cause problem and
can help in so many situations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]