Could be. I'm on the road today, but I'll take a look at it this evening. On Tue, Jul 24, 2012 at 8:48 AM, Gauthier AMBARD <[email protected]>wrote:
> Yep, > http://apache.mirrors.multidist.eu/hadoop/common/stable/hadoop-1.0.3-bin.tar.gz > and > hadoop version says : > Hadoop 1.0.3 > Subversion > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r > 1335192 > Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012 > From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be > > Maybe it has to do with some configuration ? > > Gauthier > > > 2012/7/24 Josh Wills <[email protected]> > >> Hey Gauthier, >> >> IIRC, that error occurs when the Hadoop version doesn't support multiple >> output files, which Crunch relies on. My understanding was that this was >> part of 1.0.3, viz. >> >> >> http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html >> >> so I'm a bit thrown-- this is the Apache distro of 1.0.3, right? Not a >> custom Hadoop build? >> >> J >> >> On Tue, Jul 24, 2012 at 8:29 AM, Gauthier AMBARD < >> [email protected]> wrote: >> >>> Hi guys, >>> >>> I wanted to use crunch, but when I tried the examples I got >>> : org.apache.crunch.impl.mr.run.CrunchRuntimeException: >>> java.io.IOException: File already >>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>> >>> I am running a git (apache incubator) version of crunch (07/24/2012) >>> against a 1.0.3 hadoop (maybe this is causing the error, >>> every dependencies are with 0.20.x hadoop). Or maybe I have messed with my >>> hadoop configuration (but I can run any hadoop example). >>> >>> Regards >>> Gauthier >>> >>> Stack trace : >>> >>> 714 [Thread-15] INFO org.apache.crunch.impl.mr.run.RTNode - Crunch >>> exception in 'Text(out)' for input: [(http://www.apache.org/).,1] >>> org.apache.crunch.impl.mr.run.CrunchRuntimeException: >>> java.io.IOException: File already >>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>> at >>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:44) >>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>> at >>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:87) >>> at >>> org.apache.crunch.CombineFn$AggregatorCombineFn.process(CombineFn.java:72) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:43) >>> at org.apache.crunch.MapFn.process(MapFn.java:34) >>> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:85) >>> at >>> org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:100) >>> at >>> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:61) >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >>> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) >>> Caused by: java.io.IOException: File already >>> exists:file:/tmp/crunch-1094145699/p1/output/_temporary/_attempt_local_0001_r_000000_0/part-r-00000 >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:228) >>> at >>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:335) >>> at >>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:368) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) >>> at >>> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128) >>> at >>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.getRecordWriter(CrunchMultipleOutputs.java:416) >>> at >>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:378) >>> at >>> org.apache.crunch.hadoop.mapreduce.lib.output.CrunchMultipleOutputs.write(CrunchMultipleOutputs.java:356) >>> at >>> org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:42) >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> >> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
