We're using Avro 1.7.4 with Apache Flume 1.9.0, and have written a
Flume interceptor in Java to handle the deserializing with the old
schema and the reserializing with the new schema.
In the interceptor, we have the following code to deserialize:
GenericDatumReader reader = new
GenericDatumReader<>(oldSchema, newSchema);
BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(event.getBody(), null);
GenericRecord record = reader.read(null, decoder);
Then the following code to reserialize:
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
GenericDatumWriter writer = new GenericDatumWriter<>(newSchema);
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outStream, null);
writer.write(record, encoder);
encoder.flush();
event.setBody(outStream.toByteArray());
On Sat, Jan 15, 2022 at 2:33 PM Spencer Nelson wrote:
>
> This should work according to the spec. What language and Avro library are
> you using, and with what version?
>
> Aliases are a bit tricky to use correctly. When deserializing, you may need
> to indicate the writer’s schema as using oldFieldName1 and oldFieldName2,
> while the reader schema uses newFieldName1 and newFieldName2. In other words,
> you may need to provide both the old and new schemas to the deserializer.
> This is just built in to how aliases work (
> https://avro.apache.org/docs/current/spec.html#Aliases). This may be a little
> abstract and unclear; it’s easier to describe in the context of a particular
> language.
>
>
> On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu wrote:
>>
>> Hi everyone,
>>
>> We have an application that receives Avro data, and it needs to rename
>> certain fields in the data before sending it downstream. The
>> application is using the following Avro schema to send the data
>> downstream (note that 2 of the fields have aliases defined):
>>
>> {
>> "name":"MyCompanyRecordAvroEncoder",
>> "aliases":["com.mycompany.avro.MyStats"],
>> "type":"record",
>> "fields":[
>> {"name":"newFieldName1","type":["null",
>> "int"],"default":null,"aliases":["oldFieldName1"]}
>>
>> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
>> : [
>> {"name":"recordId","type":"long"},
>> {"name":"recordName","type":["null", "string"],"default":null},
>> {"name":"newFieldName2","type":["null",
>> "string"],"default":null,"aliases":["oldFieldName2"]}
>> ]}}, "default": []}
>> ]
>> }
>>
>> We see that our application receives the following Avro data:
>>
>> {
>> "oldFieldName1": 300,
>> "statusRecords": [
>> {
>> "recordId": 100,
>> "recordName": "Record1",
>> "oldFieldName2":
>> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
>> }
>> ]
>> }
>>
>> Then the application sends the following Avro data downstream:
>>
>> {
>> "newFieldName1": 300,
>> "statusRecords": [
>> {
>> "recordId": 100,
>> "recordName": "Record1",
>> "newFieldName2": null
>> }
>> ]
>> }
>>
>> As you can see, newFieldName1 is aliased to oldFieldName1 and has the
>> value from oldFieldName1, so its alias is working.
>>
>> However, newFieldName2 is aliased to oldFieldName2, but it is null
>> instead of having the value from oldFieldName2, so its alias is not
>> working.
>>
>> The only difference I see between newFieldName1 and newFieldName2 is
>> that newFieldName2 is a field within an array item. Do aliases not
>> work for fields in array items? Or is there some other issue?
>>
>> Any idea how can I get the alias for newFieldName2 to work?
>>
>> Thanks,
>> Spencer