[ https://issues.apache.org/jira/browse/AVRO-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514982#comment-17514982 ]
Micah Kornfield commented on AVRO-3408: --------------------------------------- * Is there an argument that it is a valid evolution from the specification perspective? It sounds like this PR wants to make two changes: 1. Add a new representation for Decimal 2. Change resolution logic to allow for interpreting a value previously not-annotated with Decimal as a decimal with the read schema constraints. * I think having a new representation is fine. Are there any mechanisms to ensure libraries are relatively up to date with each other to accept the new implementation? * What other things have used the hard-coded type logic? Is it expected other implementations follow Java's convention (i.e. are we saying the Java implementation really is the specification)? > Schema evolution with logical types > ------------------------------------ > > Key: AVRO-3408 > URL: https://issues.apache.org/jira/browse/AVRO-3408 > Project: Apache Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.11.0 > Reporter: Ivan Zemlyanskiy > Assignee: Ivan Zemlyanskiy > Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Hello! > First of all, thank you for this project. I love Avro encoding from both > technology and code culture points of view. (y) > I know you recommend migrating schema by adding a new field and removing the > old one in the future, but please-please-please consider my case as well. > In my company, we have some DTOs, and it's about 200+ fields in total that we > encode with Avro and send over the network. About a third of them have type > `java.math.BigDecimal`. At some point, we discovered we send them with a > schema like > {code:json} > { > "name":"performancePrice", > "type":{ > "type":"string", > "java-class":"java.math.BigDecimal" > } > } > {code} > That's a kind of disaster for us cos we have pretty much a high load with ~2 > million RPS. > So we start to think about migrating to something lighter than strings (no > blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, > and string is the easiest way for encoding/decoding). > It was fine to make a standard precision for all such fields, so we found > `Conversions.DecimalConversion` and decided at the end of the day we were > going to use this logical type with a recommended schema like > {code:java} > @Override > public Schema getRecommendedSchema() { > Schema schema = Schema.create(Schema.Type.BYTES); > LogicalTypes.Decimal decimalType = > LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), > DecimalUtils.MONEY_ROUNDING_SCALE); > decimalType.addToSchema(schema); > return schema; > } > {code} > (we use `org.apache.avro.reflect.ReflectData`) > It all looks good and promising, but the question is how to migrate to such > schema? > As I said, we have a lot of such fields, and migrating all of them with > duplication fields with future removal might be painful and would cost us a > considerable overhead. > I made some tests and found out if two applications register the same > `BigDecimalConversion` but for one application the `getRecommendedSchema()` > is like the method above and for another application the > `getRecommendedSchema()` is > {code:java} > @Override > public Schema getRecommendedSchema() { > Schema schema = Schema.create(Schema.Type.STRING); > schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName()); > return schema; > } > {code} > so they can easily read each other messages using _SERVER_ schema. > So, I made two applications and wired them up with `ProtocolRepository`, > `ReflectResponder` and all that stuff, I found out it doesn't work. Because > `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some > reason. > So as a result, one application specifically told "I encode this field as a > byte array which supposed to be a logical type 'decimal' with precision N", > but another application just tries to convert those bytes to a string and > make a BigDecimal based on the result string. As a result, we got > {code:java} > java.lang.NumberFormatException: Character ' is neither a decimal digit > number, decimal point, nor "e" notation exponential mark. > {code} > In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect > logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding > conversion instance for reading values. In my example, I'd say it might be > {code} > ResolvingDecoder#readString() -> read the actual logical type -> find > BigDecimalConversion instance -> > conversion.fromBytes(readValueWithActualSchema()) -> > conversion.toCharSequence(readValueWithConversion) > {code} > I'd love to read your opinion on all of that. > Thank you in advance for your time, and sorry for the long issue description. -- This message was sent by Atlassian Jira (v8.20.1#820001)