[jira] [Work logged] (AVRO-3408) Schema evolution with logical types

ASF GitHub Bot (Jira) Sun, 15 May 2022 23:07:07 -0700


     [ 
https://issues.apache.org/jira/browse/AVRO-3408?focusedWorklogId=770667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-770667
 ]


ASF GitHub Bot logged work on AVRO-3408:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/May/22 06:06
            Start Date: 16/May/22 06:06
    Worklog Time Spent: 10m 
      Work Description: izemlyanskiy opened a new pull request, #1584:
URL: https://github.com/apache/avro/pull/1584

   Hello! Please take a look a PR for the issue 
https://issues.apache.org/jira/browse/AVRO-3408
   
   I hope I fulfilled all PR requirements but feel free to point me out if I 
missed something.
   
   Thank you in advance. I would love the read what does the community think 
about it. 




Issue Time Tracking
-------------------

    Worklog Id:     (was: 770667)
    Time Spent: 4h 40m  (was: 4.5h)

> Schema evolution with logical types 
> ------------------------------------
>
>                 Key: AVRO-3408
>                 URL: https://issues.apache.org/jira/browse/AVRO-3408
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.0
>            Reporter: Ivan Zemlyanskiy
>            Assignee: Ivan Zemlyanskiy
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Hello!
> First of all, thank you for this project. I love Avro encoding from both 
> technology and code culture points of view. (y)
> I know you recommend migrating schema by adding a new field and removing the 
> old one in the future, but please-please-please consider my case as well. 
> In my company, we have some DTOs, and it's about 200+ fields in total that we 
> encode with Avro and send over the network. About a third of them have type 
> `java.math.BigDecimal`. At some point, we discovered we send them with a 
> schema like
> {code:json}
> {
>   "name":"performancePrice",
>   "type":{
>     "type":"string",
>     "java-class":"java.math.BigDecimal"
>   }
> }
> {code}
> That's a kind of disaster for us cos we have pretty much a high load with ~2 
> million RPS. 
> So we start to think about migrating to something lighter than strings (no 
> blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, 
> and string is the easiest way for encoding/decoding).
> It was fine to make a standard precision for all such fields, so we found 
> `Conversions.DecimalConversion` and decided at the end of the day we were 
> going to use this logical type with a recommended schema like
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.BYTES);
>         LogicalTypes.Decimal decimalType =
>                 LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), 
> DecimalUtils.MONEY_ROUNDING_SCALE);
>         decimalType.addToSchema(schema);
>         return schema;
>     }
> {code}
> (we use `org.apache.avro.reflect.ReflectData`)
> It all looks good and promising, but the question is how to migrate to such 
> schema? 
> As I said, we have a lot of such fields, and migrating all of them with 
> duplication fields with future removal might be painful and would cost us a 
> considerable overhead.
> I made some tests and found out if two applications register the same 
> `BigDecimalConversion` but for one application the `getRecommendedSchema()` 
> is like the method above and for another application the 
> `getRecommendedSchema()` is
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.STRING);
>         schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName());
>         return schema;
>     }
> {code}
> so they can easily read each other messages using _SERVER_ schema.
> So, I made two applications and wired them up with `ProtocolRepository`, 
> `ReflectResponder` and all that stuff, I found out it doesn't work. Because 
> `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some 
> reason. 
> So as a result, one application specifically told "I encode this field as a 
> byte array which supposed to be a logical type 'decimal' with precision N", 
> but another application just tries to convert those bytes to a string and 
> make a BigDecimal based on the result string. As a result, we got
> {code:java}
> java.lang.NumberFormatException: Character ' is neither a decimal digit 
> number, decimal point, nor "e" notation exponential mark.
> {code}
> In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect 
> logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding 
> conversion instance for reading values. In my example, I'd say it might be 
> {code}
> ResolvingDecoder#readString() -> read the actual logical type -> find 
> BigDecimalConversion instance -> 
> conversion.fromBytes(readValueWithActualSchema()) -> 
> conversion.toCharSequence(readValueWithConversion)
> {code}
> I'd love to read your opinion on all of that. 
> Thank you in advance for your time, and sorry for the long issue description. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (AVRO-3408) Schema evolution with logical types

Reply via email to