Re: MR output to a file instead of directory?
I'm not sure about the usecase, but if you really care you can use an existing directory (e.g. /) by writing a bit of code to bypass the check for output-dir existence... By default FIleOutputFormat assumes the output-dir shouldn't exist and will error out during init if it does. You could customize it to not bother to check. Arun On Mar 2, 2012, at 4:38 PM, Jianhui Zhang wrote: > Hi all, > > The FileOutputFormat/FileOutputCommitter always treats an output path > as a directory and write files under it, even if there is only one > Reducer. Is there any way to configure an OutputFormat to write all > data into a file? > > Thanks, > James -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: MR output to a file instead of directory?
James, This is _possible_, but you will need a complete set of both OutputFormat and OutputCommitter to do the work for you as File{OutputFormat,OutputCommitter} work with directories. The biggest advantage of having output directories is the ability to have temporary attempt directories and output-committing (speculative execution and task failure handling), described at http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F. -- You'd need something like this for a complete solution. On Sat, Mar 3, 2012 at 6:08 AM, Jianhui Zhang wrote: > Hi all, > > The FileOutputFormat/FileOutputCommitter always treats an output path > as a directory and write files under it, even if there is only one > Reducer. Is there any way to configure an OutputFormat to write all > data into a file? > > Thanks, > James -- Harsh J
MR output to a file instead of directory?
Hi all, The FileOutputFormat/FileOutputCommitter always treats an output path as a directory and write files under it, even if there is only one Reducer. Is there any way to configure an OutputFormat to write all data into a file? Thanks, James