>> I would expect that >> named outputs would not be used in my simple pipeline, so name would >> be null, but it actually seems that the name parameter is 'out0'. So >> my first question is: what determines when named outputs are used?
Looking at the code the output is always named[1] regardless of the number of outputs. Do you believe the use of a name is causing an issue with the utilization of your custom committer? Regarding your second question I need to do a bit more digging to answer for certain. [1] - https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCROutputHandler.java#L64 On Wed, Jan 29, 2014 at 10:11 AM, Tom White <[email protected]> wrote: > Hi, > > I'm writing a Crunch Target that is a MapReduceTarget, but not a > PathTarget, since it writes to files in a partitioned manner, so there > is not necessarily a single output path. I'm confused about the 'name' > parameter in configureForMapReduce() though - I would expect that > named outputs would not be used in my simple pipeline, so name would > be null, but it actually seems that the name parameter is 'out0'. So > my first question is: what determines when named outputs are used? > > In the past this hasn't been a problem (e.g. with the Parquet target), > but this output format has a custom output committer which isn't being > used. Instead it looks like the default file committer is being used > by Crunch, so the job fails. Is it possible to use custom output > committers with Crunch? > > My code is here: > > https://github.com/tomwhite/kite/blob/CDK-251-mr/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L100 > > Cheers, > Tom >
