Re: Output Committers and Crunch Targets

Micah Whitacre Wed, 29 Jan 2014 13:22:31 -0800

>> I would expect that
>> named outputs would not be used in my simple pipeline, so name would
>> be null, but it actually seems that the name parameter is 'out0'. So
>> my first question is: what determines when named outputs are used?


Looking at the code the output is always named[1] regardless of the number
of outputs.  Do you believe the use of a name is causing an issue with the
utilization of your custom committer?

Regarding your second question I need to do a bit more digging to answer
for certain.

[1] -
https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/MSCROutputHandler.java#L64




On Wed, Jan 29, 2014 at 10:11 AM, Tom White <[email protected]> wrote:

> Hi,
>
> I'm writing a Crunch Target that is a MapReduceTarget, but not a
> PathTarget, since it writes to files in a partitioned manner, so there
> is not necessarily a single output path. I'm confused about the 'name'
> parameter in configureForMapReduce() though - I would expect that
> named outputs would not be used in my simple pipeline, so name would
> be null, but it actually seems that the name parameter is 'out0'. So
> my first question is: what determines when named outputs are used?
>
> In the past this hasn't been a problem (e.g. with the Parquet target),
> but this output format has a custom output committer which isn't being
> used. Instead it looks like the default file committer is being used
> by Crunch, so the job fails. Is it possible to use custom output
> committers with Crunch?
>
> My code is here:
>
> https://github.com/tomwhite/kite/blob/CDK-251-mr/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L100
>
> Cheers,
> Tom
>

Re: Output Committers and Crunch Targets

Reply via email to