Is it possible to have multiple targets that Crunch runs in one MapReduce job? If so then there will be a conflict, and Crunch will need some changes to support this case.
Tom On Thu, Feb 27, 2014 at 3:34 PM, Chao Shi <[email protected]> wrote: > Hi Tom, > > I will have to use named-output. About your example DatasetTarget, is it > safe to setOutputFormat() explicitly here? I guess this may conflict with > other targets that only use the same trick. Is it possible for us to have a > general approach to get OutputCommitter work? > Hi Chao, > > Crunch doesn't call the output committer explicitly itself, it's > called by the MR framework as a normal part of running a job. However, > in Crunch's MapReduceTarget#configureForMapReduce the output format is > not typically set for the named-output case (which is the only case > that is executed now, as I discovered in the thread mentioned below), > so it defaults to FileOutputFormat, with its semantics. (This is why > HBaseTarget calls FileOutputFormat.setOutputPath, which it wouldn't > have to if it set the output format explicitly to HBase's > TableOutputFormat.) > > Are you setting the HCatOutputFormat in the named-output case? In the > Crunch Target I'm writing I've set the OutputFormat explicitly: > https://github.com/tomwhite/kite/blob/CDK-308-dataset-output-format/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L106 > > Cheers, > Tom > > On Thu, Feb 27, 2014 at 7:54 AM, Gabriel Reid <[email protected]> > wrote: >> For reference, here's the link to the previous thread on this: >> > http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3cCAF-WD4Sig2n7yMxiZSji8trQy-8wfUy5_7dnKC=dksxmrfs...@mail.gmail.com%3e >> >> On Thu, Feb 27, 2014 at 7:56 AM, Josh Wills <[email protected]> wrote: >>> +tom >>> >>> Didn't Tom have a thing like this a little while ago? >>> >>> >>> On Wed, Feb 26, 2014 at 8:04 PM, Chao Shi <[email protected]> wrote: >>> >>>> Hi crunch devs, >>>> >>>> I'm developing target wrapper for HCatOutputFormat, which uses a custom >>>> OutputCommiter to get results committed to hive. It seems its >>>> OutputCommitter is not called at all. Looking into the code, I can't > find >>>> where crunch calls it. Is it really supported? >>>> >>>> Thanks, >>>> Chao >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera <http://www.cloudera.com> >>> Twitter: @josh_wills <http://twitter.com/josh_wills>
