Hey Gauthier, IIRC, that error occurs when the Hadoop version doesn't support multiple output files, which Crunch relies on. My understanding was that this was part of 1.0.3, viz.
http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a custom Hadoop build? J On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD <[email protected]>wrote: > Hi guys, > > I wanted to use crunch, but when I tried the examples I got > : org.apache.crunch.impl.mr.run.CrunchRuntimeException: > java.io.IOException: File already > exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 > > I am running a git (apache incubator) version of crunch (07/24/2012) > against a 1.0.3 hadoop (maybe this is causing the error, > every dependencies are with 0.20.x hadoop). Or maybe I have messed with my > hadoop configuration (but I can run any hadoop example). > > Regards > Gauthier > > Stack trace : > > 714 [Thread-15] INFO org.apache.crunch.impl.mr.run.RTNode - Crunch > exception in 'Text(out)' for input: [(http://www.apache.org/).,1] > org.apache.crunch.impl.mr.run.CrunchRuntimeException: java.io.IOException: > File already > exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 > at > org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44) > at org.apache.crunch.MapFn.process(MapFn.java:34) > at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) > at > org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) > at org.apache.crunch.MapFn.process(MapFn.java:34) > at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) > at > org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) > at > org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87) > at > org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72) > at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) > at > org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) > at org.apache.crunch.MapFn.process(MapFn.java:34) > at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) > at org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100) > at > org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > Caused by: java.io.IOException: File already > exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) > at > org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128) > at > org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416) > at > org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378) > at > org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356) > at > org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42) > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
