[
https://issues.apache.org/jira/browse/MAPREDUCE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800824#comment-17800824
]
Anthony Pessy commented on MAPREDUCE-6893:
------------------------------------------
I have the same needs where I wish to split an input into multiple outputs
where each output has a different parquet schema (using
ParquetOutputFormat) and this feature would have been helpful as the schema
used by the RecordWriter comes from the configuration.
Apparently a "workaround is easy" but I'm not sure how since everything is
private in MultipleOutputs.
If anyone has resolved a similar issue I'd be interested. Not sure where I
should tackle this (Fork MultipleOutputs, custom ParquetOutputFormat subclass
for each named output, ..)
> MultipleOutputs to have configuration overrides for a named output
> ------------------------------------------------------------------
>
> Key: MAPREDUCE-6893
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6893
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: piyush mukati
> Priority: Minor
>
> To support the use-cases where we have to give different config for different
> named output.
> one use case may be that
> we need different schemas for different named output and schema is passed as
> config in Record Writer.
> or example:
> we have two named output "schema1" and "schema2"
> for schema1 we want the value of key "schema" as "int"
> while for schema2 it should be "string".
> so we can provide config as
> "multioutput.overrideKeys= schema"
> "multioutput. schema1.schema=int"
> "multioutput. schema2.schema=string"
> and while creating context in getContext
> config for schema1 will resolved to "schema=int"
> and for schema2 it will be "schema=string"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]