Vladislav Spivak created AVRO-1452:
--------------------------------------

             Summary: Problem when using AvroMultipleOutputs with multiple 
schemas
                 Key: AVRO-1452
                 URL: https://issues.apache.org/jira/browse/AVRO-1452
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.7.6
         Environment: Any Platform
            Reporter: Vladislav Spivak


When using multiple named outputs with different Key/Value Schemas, the last 
provided schema overrides any previous schema definitions after first write 
attempt. This happens due to issue with the following  code in 
AvroMultipleOutputs.java:509
/*begin*/
    Job job = new Job(context.getConfiguration());
   ...
    setSchema(job, keySchema, valSchema);
    taskContext = createTaskAttemptContext(
      job.getConfiguration(), context.getTaskAttemptID());
/*end*/
Every time this code runs, actual configuration instance passed to 
createTaskAttemptContext remains the same, because Job constructor creates new 
configuration copy only if it is not instanceof JobConf. This way we have 
properties  "avro.schema.output.XXX" overwrote each time new TaskAttemptContext 
is initialised and also mistakenly shared Configuration instance for all 
TaskAttemptContextes

Proposed fix:
a) use "Job getInstance(Configuration conf)" or
b) call "new Job(new Configuration(context.getConfiguration))"







--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to