[
https://issues.apache.org/jira/browse/AVRO-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890307#comment-13890307
]
Vladislav Spivak commented on AVRO-1452:
----------------------------------------
Will try to arrange some standalone sample ASAP
> Problem when using AvroMultipleOutputs with multiple schemas
> ------------------------------------------------------------
>
> Key: AVRO-1452
> URL: https://issues.apache.org/jira/browse/AVRO-1452
> Project: Avro
> Issue Type: Bug
> Affects Versions: 1.7.6
> Environment: Any Platform
> Reporter: Vladislav Spivak
> Labels: easyfix
>
> When using multiple named outputs with different Key/Value Schemas, the last
> provided schema overrides any previous schema definitions after first write
> attempt. This happens due to issue with the following code in
> AvroMultipleOutputs.java:509
> /*begin*/
> Job job = new Job(context.getConfiguration());
> ...
> setSchema(job, keySchema, valSchema);
> taskContext = createTaskAttemptContext(
> job.getConfiguration(), context.getTaskAttemptID());
> /*end*/
> Every time this code runs, actual configuration instance passed to
> createTaskAttemptContext remains the same, because Job constructor creates
> new configuration copy only if it is not instanceof JobConf. This way we have
> properties "avro.schema.output.XXX" overwrote each time new
> TaskAttemptContext is initialised and also mistakenly shared Configuration
> instance for all TaskAttemptContextes
> Proposed fix:
> a) use "Job getInstance(Configuration conf)" or
> b) call "new Job(new Configuration(context.getConfiguration))"
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)