Re: Unable to get Avro schema alias working for an array item

2022-01-15 Thread Spencer Lu
We're using Avro 1.7.4 with Apache Flume 1.9.0, and have written a
Flume interceptor in Java to handle the deserializing with the old
schema and the reserializing with the new schema.

In the interceptor, we have the following code to deserialize:

GenericDatumReader reader = new
GenericDatumReader<>(oldSchema, newSchema);
BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(event.getBody(), null);
GenericRecord record = reader.read(null, decoder);

Then the following code to reserialize:

ByteArrayOutputStream outStream = new ByteArrayOutputStream();
GenericDatumWriter writer = new GenericDatumWriter<>(newSchema);
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outStream, null);
writer.write(record, encoder);
encoder.flush();
event.setBody(outStream.toByteArray());


On Sat, Jan 15, 2022 at 2:33 PM Spencer Nelson  wrote:
>
> This should work according to the spec. What language and Avro library are 
> you using, and with what version?
>
> Aliases are a bit tricky to use correctly. When deserializing, you may need 
> to indicate the writer’s schema as using oldFieldName1 and oldFieldName2, 
> while the reader schema uses newFieldName1 and newFieldName2. In other words, 
> you may need to provide both the old and new schemas to the deserializer. 
> This is just built in to how aliases work (
> https://avro.apache.org/docs/current/spec.html#Aliases). This may be a little 
> abstract and unclear; it’s easier to describe in the context of a particular 
> language.
>
>
> On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu  wrote:
>>
>> Hi everyone,
>>
>> We have an application that receives Avro data, and it needs to rename
>> certain fields in the data before sending it downstream. The
>> application is using the following Avro schema to send the data
>> downstream (note that 2 of the fields have aliases defined):
>>
>> {
>> "name":"MyCompanyRecordAvroEncoder",
>> "aliases":["com.mycompany.avro.MyStats"],
>> "type":"record",
>> "fields":[
>> {"name":"newFieldName1","type":["null",
>> "int"],"default":null,"aliases":["oldFieldName1"]}
>> 
>> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
>> : [
>> {"name":"recordId","type":"long"},
>> {"name":"recordName","type":["null", "string"],"default":null},
>> {"name":"newFieldName2","type":["null",
>> "string"],"default":null,"aliases":["oldFieldName2"]}
>> ]}}, "default": []}
>> ]
>> }
>>
>> We see that our application receives the following Avro data:
>>
>> {
>> "oldFieldName1": 300,
>> "statusRecords": [
>> {
>> "recordId": 100,
>> "recordName": "Record1",
>> "oldFieldName2":
>> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
>> }
>> ]
>> }
>>
>> Then the application sends the following Avro data downstream:
>>
>> {
>>  "newFieldName1": 300,
>>  "statusRecords": [
>>  {
>>  "recordId": 100,
>>  "recordName": "Record1",
>>  "newFieldName2": null
>>  }
>>  ]
>> }
>>
>> As you can see, newFieldName1 is aliased to oldFieldName1 and has the
>> value from oldFieldName1, so its alias is working.
>>
>> However, newFieldName2 is aliased to oldFieldName2, but it is null
>> instead of having the value from oldFieldName2, so its alias is not
>> working.
>>
>> The only difference I see between newFieldName1 and newFieldName2 is
>> that newFieldName2 is a field within an array item. Do aliases not
>> work for fields in array items? Or is there some other issue?
>>
>> Any idea how can I get the alias for newFieldName2 to work?
>>
>> Thanks,
>> Spencer


Re: Unable to get Avro schema alias working for an array item

2022-01-15 Thread Spencer Nelson
This should work according to the spec. What language and Avro library are
you using, and with what version?

Aliases are a bit tricky to use correctly. When deserializing, you may need
to indicate the writer’s schema as using oldFieldName1 and oldFieldName2,
while the reader schema uses newFieldName1 and newFieldName2. In other
words, you may need to provide both the old and new schemas to the
deserializer. This is just built in to how aliases work (
https://avro.apache.org/docs/current/spec.html#Aliases). This may be a
little abstract and unclear; it’s easier to describe in the context of a
particular language.


On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu  wrote:

> Hi everyone,
>
> We have an application that receives Avro data, and it needs to rename
> certain fields in the data before sending it downstream. The
> application is using the following Avro schema to send the data
> downstream (note that 2 of the fields have aliases defined):
>
> {
> "name":"MyCompanyRecordAvroEncoder",
> "aliases":["com.mycompany.avro.MyStats"],
> "type":"record",
> "fields":[
> {"name":"newFieldName1","type":["null",
> "int"],"default":null,"aliases":["oldFieldName1"]}
>
> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
> : [
> {"name":"recordId","type":"long"},
> {"name":"recordName","type":["null", "string"],"default":null},
> {"name":"newFieldName2","type":["null",
> "string"],"default":null,"aliases":["oldFieldName2"]}
> ]}}, "default": []}
> ]
> }
>
> We see that our application receives the following Avro data:
>
> {
> "oldFieldName1": 300,
> "statusRecords": [
> {
> "recordId": 100,
> "recordName": "Record1",
> "oldFieldName2":
>
> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
> }
> ]
> }
>
> Then the application sends the following Avro data downstream:
>
> {
>  "newFieldName1": 300,
>  "statusRecords": [
>  {
>  "recordId": 100,
>  "recordName": "Record1",
>  "newFieldName2": null
>  }
>  ]
> }
>
> As you can see, newFieldName1 is aliased to oldFieldName1 and has the
> value from oldFieldName1, so its alias is working.
>
> However, newFieldName2 is aliased to oldFieldName2, but it is null
> instead of having the value from oldFieldName2, so its alias is not
> working.
>
> The only difference I see between newFieldName1 and newFieldName2 is
> that newFieldName2 is a field within an array item. Do aliases not
> work for fields in array items? Or is there some other issue?
>
> Any idea how can I get the alias for newFieldName2 to work?
>
> Thanks,
> Spencer
>


Unable to get Avro schema alias working for an array item

2022-01-15 Thread Spencer Lu
Hi everyone,

We have an application that receives Avro data, and it needs to rename
certain fields in the data before sending it downstream. The
application is using the following Avro schema to send the data
downstream (note that 2 of the fields have aliases defined):

{
"name":"MyCompanyRecordAvroEncoder",
"aliases":["com.mycompany.avro.MyStats"],
"type":"record",
"fields":[
{"name":"newFieldName1","type":["null",
"int"],"default":null,"aliases":["oldFieldName1"]}

{"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
: [
{"name":"recordId","type":"long"},
{"name":"recordName","type":["null", "string"],"default":null},
{"name":"newFieldName2","type":["null",
"string"],"default":null,"aliases":["oldFieldName2"]}
]}}, "default": []}
]
}

We see that our application receives the following Avro data:

{
"oldFieldName1": 300,
"statusRecords": [
{
"recordId": 100,
"recordName": "Record1",
"oldFieldName2":
"{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
}
]
}

Then the application sends the following Avro data downstream:

{
 "newFieldName1": 300,
 "statusRecords": [
 {
 "recordId": 100,
 "recordName": "Record1",
 "newFieldName2": null
 }
 ]
}

As you can see, newFieldName1 is aliased to oldFieldName1 and has the
value from oldFieldName1, so its alias is working.

However, newFieldName2 is aliased to oldFieldName2, but it is null
instead of having the value from oldFieldName2, so its alias is not
working.

The only difference I see between newFieldName1 and newFieldName2 is
that newFieldName2 is a field within an array item. Do aliases not
work for fields in array items? Or is there some other issue?

Any idea how can I get the alias for newFieldName2 to work?

Thanks,
Spencer