Hi all, Please take a look at the following pipeline:
read(From.textFile(args[0])).write(To.textFile(args[1] + "-text")); run(); read(From.textFile(args[0])).write(To.sequenceFile(args[1] + "-seq")); run(); read(From.textFile(args[0])).write(To.avroFile(args[1] + "-avro")); done(); The first two jobs are fine, and give correct output types of text and sequence files respectively. The text to avro conversion fails. This is no great surprise, knowing a little about the internals of Crunch, but when put alongside the other examples it feels like it should work. Even if it can't work - no big deal, it's just a toy example. The main problem for me was the error message: 13/06/28 14:11:40 INFO jobcontrol.CrunchControlledJob: org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) I think the job should have been killed somewhere before this point. There must be a bit of logic (though I haven't properly looked for it) which decides the requested target is no good for the PCollection provided, so the exception should be raised there with a message explaining this. What do you think? I'm sure there's a JIRA ticket lurking somewhere in all this - I'm just not sure what it is! :) Thanks, Dave
