[ https://issues.apache.org/jira/browse/AVRO-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508373#comment-17508373 ]
Ryan Skraba commented on AVRO-3408: ----------------------------------- Thanks for the interesting and well described use case! I turned on GitHub actions for your PR, and my apologies for taking so long to getting to it! Before Avro 1.8, the only technique to have an in-memory representation of BigDecimal was to use a [Stringable|https://avro.apache.org/docs/1.11.0/api/java/org/apache/avro/reflect/Stringable.html] (as you've originally used it). As you've discovered, it isn't a great representation, and logical types are much more concise. There's some special handling in Java (and the evidence is the "isStringable" methods you find here and there). The *{{decimal}}* logical type was designed to only applies to the two types of binary data (bytes and fixed). Unfortunately your PR extends this to bytes, fixed and string... I really don't have any strong objection of course, but this should probably be discussed on the mailing list and changed in the [specification|https://avro.apache.org/docs/current/spec.html#Decimal] (which is probably something that shouldn't be included in the next minor release). To be honest, the best approach (in Java) might be to just hard-code the conversion of Stringable BigDecimal to logicalType BigDecimal when it's detected! > Schema evolution with logical types > ------------------------------------ > > Key: AVRO-3408 > URL: https://issues.apache.org/jira/browse/AVRO-3408 > Project: Apache Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.11.0 > Reporter: Ivan Zemlyanskiy > Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > Hello! > First of all, thank you for this project. I love Avro encoding from both > technology and code culture points of view. (y) > I know you recommend migrating schema by adding a new field and removing the > old one in the future, but please-please-please consider my case as well. > In my company, we have some DTOs, and it's about 200+ fields in total that we > encode with Avro and send over the network. About a third of them have type > `java.math.BigDecimal`. At some point, we discovered we send them with a > schema like > {code:json} > { > "name":"performancePrice", > "type":{ > "type":"string", > "java-class":"java.math.BigDecimal" > } > } > {code} > That's a kind of disaster for us cos we have pretty much a high load with ~2 > million RPS. > So we start to think about migrating to something lighter than strings (no > blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, > and string is the easiest way for encoding/decoding). > It was fine to make a standard precision for all such fields, so we found > `Conversions.DecimalConversion` and decided at the end of the day we were > going to use this logical type with a recommended schema like > {code:java} > @Override > public Schema getRecommendedSchema() { > Schema schema = Schema.create(Schema.Type.BYTES); > LogicalTypes.Decimal decimalType = > LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), > DecimalUtils.MONEY_ROUNDING_SCALE); > decimalType.addToSchema(schema); > return schema; > } > {code} > (we use `org.apache.avro.reflect.ReflectData`) > It all looks good and promising, but the question is how to migrate to such > schema? > As I said, we have a lot of such fields, and migrating all of them with > duplication fields with future removal might be painful and would cost us a > considerable overhead. > I made some tests and found out if two applications register the same > `BigDecimalConversion` but for one application the `getRecommendedSchema()` > is like the method above and for another application the > `getRecommendedSchema()` is > {code:java} > @Override > public Schema getRecommendedSchema() { > Schema schema = Schema.create(Schema.Type.STRING); > schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName()); > return schema; > } > {code} > so they can easily read each other messages using _SERVER_ schema. > So, I made two applications and wired them up with `ProtocolRepository`, > `ReflectResponder` and all that stuff, I found out it doesn't work. Because > `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some > reason. > So as a result, one application specifically told "I encode this field as a > byte array which supposed to be a logical type 'decimal' with precision N", > but another application just tries to convert those bytes to a string and > make a BigDecimal based on the result string. As a result, we got > {code:java} > java.lang.NumberFormatException: Character ' is neither a decimal digit > number, decimal point, nor "e" notation exponential mark. > {code} > In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect > logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding > conversion instance for reading values. In my example, I'd say it might be > {code} > ResolvingDecoder#readString() -> read the actual logical type -> find > BigDecimalConversion instance -> > conversion.fromBytes(readValueWithActualSchema()) -> > conversion.toCharSequence(readValueWithConversion) > {code} > I'd love to read your opinion on all of that. > Thank you in advance for your time, and sorry for the long issue description. -- This message was sent by Atlassian Jira (v8.20.1#820001)