[ https://issues.apache.org/jira/browse/HADOOP-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275591#comment-14275591 ]
Akshay Rai commented on HADOOP-7611: ------------------------------------ [~ozawa], any thoughts? > SequenceFile.Sorter creates local temp files on HDFS > ---------------------------------------------------- > > Key: HADOOP-7611 > URL: https://issues.apache.org/jira/browse/HADOOP-7611 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 0.20.2 > Environment: CentOS 5.6 64-bit, Oracle JDK 1.6.0_26 64-bit > Reporter: Bryan Keller > > When using SequenceFile.Sorter to sort or merge sequence files that exist in > HDFS, it attempts to create temp files in a directory structure specified by > mapred.local.dir but on HDFS, not in the local file system. The problem code > is in MergeQueue.merge(). Starting at line 2953: > {code} > Path outputFile = lDirAlloc.getLocalPathForWrite( > tmpFilename.toString(), > approxOutputSize, conf); > LOG.debug("writing intermediate results to " + outputFile); > Writer writer = cloneFileAttributes( > > fs.makeQualified(segmentsToMerge.get(0).segmentPathName), > fs.makeQualified(outputFile), > null); > {code} > The outputFile here is a local path without a scheme, e.g. > "/mnt/mnt1/mapred/local", specified by the mapred.local.dir property. If we > are sorting files on HDFS, the fs object is a DistributedFileSystem. The call > to fs.makeQualified(outputFile) appends the fs object's scheme to the local > temp path returned by lDirAlloc, e.g. hdfs://mnt/mnt1/mapred/local. This > directory is then created (if the proper permissions are available) on HDFS. > If the HDFS permissions are not available, the sort/merge fails even though > the directories exist locally. > The code should instead always use the local file system if retrieving a path > from the mapred.local.dir property. The unit tests do not test this > condition, they only test using the local file system for sort and merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)