[ 
https://issues.apache.org/jira/browse/AVRO-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890307#comment-13890307
 ] 

Vladislav Spivak commented on AVRO-1452:
----------------------------------------

Will try to arrange some standalone sample ASAP

> Problem when using AvroMultipleOutputs with multiple schemas
> ------------------------------------------------------------
>
>                 Key: AVRO-1452
>                 URL: https://issues.apache.org/jira/browse/AVRO-1452
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.7.6
>         Environment: Any Platform
>            Reporter: Vladislav Spivak
>              Labels: easyfix
>
> When using multiple named outputs with different Key/Value Schemas, the last 
> provided schema overrides any previous schema definitions after first write 
> attempt. This happens due to issue with the following  code in 
> AvroMultipleOutputs.java:509
> /*begin*/
>     Job job = new Job(context.getConfiguration());
>    ...
>     setSchema(job, keySchema, valSchema);
>     taskContext = createTaskAttemptContext(
>       job.getConfiguration(), context.getTaskAttemptID());
> /*end*/
> Every time this code runs, actual configuration instance passed to 
> createTaskAttemptContext remains the same, because Job constructor creates 
> new configuration copy only if it is not instanceof JobConf. This way we have 
> properties  "avro.schema.output.XXX" overwrote each time new 
> TaskAttemptContext is initialised and also mistakenly shared Configuration 
> instance for all TaskAttemptContextes
> Proposed fix:
> a) use "Job getInstance(Configuration conf)" or
> b) call "new Job(new Configuration(context.getConfiguration))"



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to