I like it. On Feb 27, 2014 6:27 PM, "Chao Shi" <[email protected]> wrote:
> How about introducinug our own OutputFormat? It can delegate to each > registered OutputCommitter (if any). > > > 2014-02-28 1:28 GMT+08:00 Josh Wills <[email protected]>: > > > It's possible to have multiple targets running in one Crunch job; in fact > > it was so common that I switched everything over to the named targets in > > order to simplify the bookkeeping. Every output format can run > > independently of every other output format using the code in > CrunchOutputs; > > I think the only reason we default to FileOutputFormat is b/c it's an > > exception for an MR config to _not_ have an OuputFormat configured, even > if > > it's never used. > > > > > > On Thu, Feb 27, 2014 at 9:03 AM, Tom White <[email protected]> wrote: > > > > > Is it possible to have multiple targets that Crunch runs in one > > > MapReduce job? If so then there will be a conflict, and Crunch will > > > need some changes to support this case. > > > > > > Tom > > > > > > On Thu, Feb 27, 2014 at 3:34 PM, Chao Shi <[email protected]> wrote: > > > > Hi Tom, > > > > > > > > I will have to use named-output. About your example DatasetTarget, is > > it > > > > safe to setOutputFormat() explicitly here? I guess this may conflict > > with > > > > other targets that only use the same trick. Is it possible for us to > > > have a > > > > general approach to get OutputCommitter work? > > > > Hi Chao, > > > > > > > > Crunch doesn't call the output committer explicitly itself, it's > > > > called by the MR framework as a normal part of running a job. > However, > > > > in Crunch's MapReduceTarget#configureForMapReduce the output format > is > > > > not typically set for the named-output case (which is the only case > > > > that is executed now, as I discovered in the thread mentioned below), > > > > so it defaults to FileOutputFormat, with its semantics. (This is why > > > > HBaseTarget calls FileOutputFormat.setOutputPath, which it wouldn't > > > > have to if it set the output format explicitly to HBase's > > > > TableOutputFormat.) > > > > > > > > Are you setting the HCatOutputFormat in the named-output case? In the > > > > Crunch Target I'm writing I've set the OutputFormat explicitly: > > > > > > > > > > https://github.com/tomwhite/kite/blob/CDK-308-dataset-output-format/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L106 > > > > > > > > Cheers, > > > > Tom > > > > > > > > On Thu, Feb 27, 2014 at 7:54 AM, Gabriel Reid < > [email protected]> > > > > wrote: > > > >> For reference, here's the link to the previous thread on this: > > > >> > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3cCAF-WD4Sig2n7yMxiZSji8trQy-8wfUy5_7dnKC=dksxmrfs...@mail.gmail.com%3e > > > >> > > > >> On Thu, Feb 27, 2014 at 7:56 AM, Josh Wills <[email protected]> > > > wrote: > > > >>> +tom > > > >>> > > > >>> Didn't Tom have a thing like this a little while ago? > > > >>> > > > >>> > > > >>> On Wed, Feb 26, 2014 at 8:04 PM, Chao Shi <[email protected]> > wrote: > > > >>> > > > >>>> Hi crunch devs, > > > >>>> > > > >>>> I'm developing target wrapper for HCatOutputFormat, which uses a > > > custom > > > >>>> OutputCommiter to get results committed to hive. It seems its > > > >>>> OutputCommitter is not called at all. Looking into the code, I > can't > > > > find > > > >>>> where crunch calls it. Is it really supported? > > > >>>> > > > >>>> Thanks, > > > >>>> Chao > > > >>>> > > > >>> > > > >>> > > > >>> > > > >>> -- > > > >>> Director of Data Science > > > >>> Cloudera <http://www.cloudera.com> > > > >>> Twitter: @josh_wills <http://twitter.com/josh_wills> > > > > > >
