I'm mostly happy with your current suggestion (two configs, one for serialization and one for deserialization) and your implementation suggestion. One thing to note:
> We should _always_ be able to deserialize a standard JSON > number as a decimal I was doing some research into decimals and JSON, and I can imagine a compelling reason to require string representations to avoid losing precision and to be certain that whomever is sending the data isn't losing precision (e.g. https://stackoverflow.com/a/38357877/2258040). I'm okay with always allowing numerics, but thought it's worth raising the thought. On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io> wrote: > The way I see it, we need to control two seperate things: > > 1. How do we _deserialize_ a decimal type if we encounter a text node in > the JSON? (We should _always_ be able to deserialize a standard JSON > number as a decimal). > 2. How do we chose how we want decimals to be _serialized_. > > This looks to fits well with your second suggestion of slightly different > configs names for serialization vs deserialization. > a, For deserialization we only care about how to handle text nodes: ` > deserialization.decimal.*text*.format`, which should only have two valid > values BINARY | TEXT. > b. For serialization we need all three: `serialization.decimal.format`, > which should support all three options: BINARY | TEXT | NUMERIC. > > Implementation wise, I think these should be two separate enums, rather > than one shared enum and throwing an error if the deserializer is set to > NUMERIC. Mainly as this means the enums reflect the options available, > rather than this being hidden in config checking code. But that's a minor > implementation detail. > > Personally, I'd be tempted to have the BINARY value named something like > `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move away from > it. > > It's a real shame that both of these settings require a default of BINARY > for backwards compatibility, but I agree that discussions / plans around > switching the defaults should not block this KIP. > > Andy > > > On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io> wrote: > > > Thanks for the replies Andy and Andrew (2x Andy?)! > > > > > Is the text decimal a base16 encoded number, or is it base16 encoded > > binary > > > form of the number? > > > > The conversion happens as decimal.unscaledValue().toByteArray() and then > > the byte array is converted to a hex string, so it's definitely the > binary > > form of the number converted to base16. Whether or not that's the same as > > the base16 encoded number is a good question (toByteArray returns a byte > > array containing a signed, big-endian, two's complement representation of > > the big integer). > > > > > One suggestion I have is to change the proposed new config to only > affect > > > decimals stored as text, i.e. to switch between the current base16 and > > the > > > more common base10. Then add another config to the serializer only > that > > > controls if decimals should be serialized as text or numeric. > > > > I think we need to be able to handle all mappings from serialization > format > > to deserialization format (e.g. read in BINARY and output TEXT), which I > > think would be impossible with the alternative suggestion. I agree that > > automatically deserializing numerics is valuable. I see two other ways to > > get this, both keeping the serialization.format config the same: > > > > - have json.decimal.deserialization.format accept all three formats. if > set > > to BINARY/TEXT, numerics would be automatically supported. If set to > > NUMERIC, then any string coming in would result in deserialization error > > (defaults to BINARY for backwards compatibility) > > - change json.decimal.deserialization.format to > > json.decimal.deserialization.string.format which accepts only BINARY/TEXT > > (defaults to BINARY for backwards compatibility) > > > > > would be a breaking change in that things that previously failed would > > > suddenly start deserializing. This is a price I'm willing to pay. > > > > I agree. I'm willing to pay this price too. > > > > > IMHO, we should then plan to switch the default of decimal > serialization > > to > > > numeric, and text serialization to base 10 in the next major release. > > > > I think that can be a separate discussion, I don't want to block this KIP > > on it. > > > > Thoughts? > > > > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <o...@wikimedia.org> wrote: > > > > > This is a bit orthogonal, but in JsonSchemaConverter I use JSONSchemas > to > > > indicate whether a JSON number should be deserialized as an integer or > a > > > decimal > > > < > > > > > > https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261 > > > >. > > > Not everyone is going to have JSONSchemas available when converting, > but > > if > > > you do, it is an easy way to support JSON numbers as decimals. > > > > > > Carry on! :) > > > > > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <a...@confluent.io> wrote: > > > > > > > Hi Almog, > > > > > > > > Like the KIP - I think being able to support decimals in JSON in the > > same > > > > way most other systems do is a great improvement. > > > > > > > > It's not 100% clear to me from the KIP what the current format is. > Is > > > the > > > > text decimal a base16 encoded number, or is it base16 encoded binary > > form > > > > of the number? (I've not tried to get my head around if these two are > > > even > > > > different!) > > > > > > > > One suggestion I have is to change the proposed new config to only > > affect > > > > decimals stored as text, i.e. to switch between the current base16 > and > > > the > > > > more common base10. Then add another config to the serialzier only > > that > > > > controls if decimals should be serialized as text or numeric. The > > > benefit > > > > of this approach is it allows us to enhance the deserializer to > > > > automatically handle numeric decimals even without any config having > to > > > be > > > > set, i.e. default config in the deserializer would be able to handle > > > > numeric decimals. Of course, this is a two edged sword: this would > > make > > > > the deserializer work out of the box with numeric decimals, (yay!), > but > > > > would be a breaking change in that things that previously failed > would > > > > suddenly start deserializing. This is a price I'm willing to pay. > > > > > > > > IMHO, we should then plan to switch the default of decimal > > serialization > > > to > > > > numeric, and text serialization to base 10 in the next major release. > > > > (With upgrade notes to match). Though I know this is more > contentious, > > I > > > > think it moves us forward in a much more standard way that the > current > > > > encoding of decimals. > > > > > > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <al...@confluent.io> > wrote: > > > > > > > > > Hi Everyone! > > > > > > > > > > Kicking off discussion for a new KIP: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON > > > > > > > > > > For those who are interested, I have a prototype implementation > that > > > > helped > > > > > guide my design: https://github.com/agavra/kafka/pull/1 > > > > > > > > > > Cheers, > > > > > Almog > > > > > > > > > > > > > > >