Cool!

Couple of nits:

- In public interfaces, typo: *json.decimal.serialization.fromat*
- In public interfaces, you use the term "HEX" instead of "BASE64".



On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <al...@confluent.io> wrote:

> EDIT: everywhere I've been using "HEX" I meant to be using "BASE64". I will
> update the KIP to reflect this.
>
> On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io> wrote:
>
> > Thanks for the feedback Arjun! I'm happy changing the default config to
> > HEX instead of BINARY, no strong feelings there.
> >
> > I'll also clarify the example in the KIP to be clearer:
> >
> > - serialize the decimal field "foo" with value "10.2345" with the HEX
> > setting: {"foo": "D3J5"}
> > - serialize the decimal field "foo" with value "10.2345" with the NUMERIC
> > setting: {"foo": 10.2345}
> >
> > With regards to the precision issue, that was my original concern as well
> > (and why I originally suggested a TEXT format). Many JSON deserializers
> > (e.g. Jackson with DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS),
> > however, have the ability to deserialize decimals correctly so I will
> > configure that as the default for Connect's JsonDeserializer. It's
> probably
> > a good idea to call out that using other deserializers must be done with
> > care - I will add that documentation to the serialization config.
> >
> > Note that there would not be an issue on the _serialization_ side of
> > things as Jackson respects BigDecimal.
> >
> > Almog
> >
> > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <arjun.sat...@gmail.com>
> > wrote:
> >
> >> hey Almog, nice work! couple of thoughts (hope I'm not late since you
> >> started the voting thread already):
> >>
> >> can you please add some examples to show the changes that you are
> >> proposing. makes me think that for a given decimal number, we will have
> >> two
> >> encodings: “asHex” and “asNumber”.
> >>
> >> should we call the default config value “HEX” instead of “BINARY”?
> >>
> >> Should we call out the fact that JS systems might be susceptible to
> double
> >> precision round offs with the new numeric format? here are some people
> >> discussing a similar problem
> >> https://github.com/EventStore/EventStore/issues/1541
> >>
> >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io> wrote:
> >>
> >> > Hello Everyone,
> >> >
> >> > Summarizing an in-person discussion with Randall (this is copied from
> >> the
> >> > KIP):
> >> >
> >> > The original KIP suggested supporting an additional representation -
> >> base10
> >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues because
> >> it
> >> > is impossible to disambiguate between TEXT and BINARY without an
> >> additional
> >> > config - furthermore, this makes the migration from one to the other
> >> nearly
> >> > impossible because it would require that all consumers stop consuming
> >> and
> >> > producers stop producing and atomically updating the config on all of
> >> them
> >> > after deploying the new code, or waiting for the full retention period
> >> to
> >> > pass - neither option is viable. The suggestion in the KIP is strictly
> >> an
> >> > improvement over the existing behavior, even if it doesn't support all
> >> > combinations.
> >> >
> >> > It seems that since most real-world use cases actually use the numeric
> >> > representation (not string) we can consider this an improvement. With
> >> the
> >> > new suggestion, we don't need a deserialization configuration (only a
> >> > serialization option) and the consumers will be able to always
> >> > automatically determine the serialization format.
> >> >
> >> > Based on this, I'll be opening up the simplified version of the KIP
> to a
> >> > vote.
> >> >
> >> > Almog
> >> >
> >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <al...@confluent.io>
> wrote:
> >> >
> >> > > I'm mostly happy with your current suggestion (two configs, one for
> >> > > serialization and one for deserialization) and your implementation
> >> > > suggestion. One thing to note:
> >> > >
> >> > > > We should _always_ be able to deserialize a standard JSON
> >> > > > number as a decimal
> >> > >
> >> > > I was doing some research into decimals and JSON, and I can imagine
> a
> >> > > compelling reason to require string representations to avoid losing
> >> > > precision and to be certain that whomever is sending the data isn't
> >> > losing
> >> > > precision (e.g. https://stackoverflow.com/a/38357877/2258040).
> >> > >
> >> > > I'm okay with always allowing numerics, but thought it's worth
> raising
> >> > the
> >> > > thought.
> >> > >
> >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io>
> >> wrote:
> >> > >
> >> > >> The way I see it, we need to control two seperate things:
> >> > >>
> >> > >> 1. How do we _deserialize_ a decimal type if we encounter a text
> >> node in
> >> > >> the JSON?    (We should _always_ be able to deserialize a standard
> >> JSON
> >> > >> number as a decimal).
> >> > >> 2. How do we chose how we want decimals to be _serialized_.
> >> > >>
> >> > >> This looks to fits well with your second suggestion of slightly
> >> > different
> >> > >> configs names for serialization vs deserialization.
> >> > >> a, For deserialization we only care about how to handle text
> nodes: `
> >> > >> deserialization.decimal.*text*.format`, which should only have two
> >> valid
> >> > >> values BINARY | TEXT.
> >> > >> b. For serialization we need all three:
> >> `serialization.decimal.format`,
> >> > >> which should support all three options: BINARY | TEXT | NUMERIC.
> >> > >>
> >> > >> Implementation wise, I think these should be two separate enums,
> >> rather
> >> > >> than one shared enum and throwing an error if the deserializer is
> >> set to
> >> > >> NUMERIC.  Mainly as this means the enums reflect the options
> >> available,
> >> > >> rather than this being hidden in config checking code.  But that's
> a
> >> > minor
> >> > >> implementation detail.
> >> > >>
> >> > >> Personally, I'd be tempted to have the BINARY value named something
> >> like
> >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move
> >> away
> >> > >> from
> >> > >> it.
> >> > >>
> >> > >> It's a real shame that both of these settings require a default of
> >> > BINARY
> >> > >> for backwards compatibility, but I agree that discussions / plans
> >> around
> >> > >> switching the defaults should not block this KIP.
> >> > >>
> >> > >> Andy
> >> > >>
> >> > >>
> >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io>
> >> wrote:
> >> > >>
> >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)!
> >> > >> >
> >> > >> > > Is the text decimal a base16 encoded number, or is it base16
> >> encoded
> >> > >> > binary
> >> > >> > > form of the number?
> >> > >> >
> >> > >> > The conversion happens as decimal.unscaledValue().toByteArray()
> and
> >> > then
> >> > >> > the byte array is converted to a hex string, so it's definitely
> the
> >> > >> binary
> >> > >> > form of the number converted to base16. Whether or not that's the
> >> same
> >> > >> as
> >> > >> > the base16 encoded number is a good question (toByteArray
> returns a
> >> > byte
> >> > >> > array containing a signed, big-endian, two's complement
> >> representation
> >> > >> of
> >> > >> > the big integer).
> >> > >> >
> >> > >> > > One suggestion I have is to change the proposed new config to
> >> only
> >> > >> affect
> >> > >> > > decimals stored as text, i.e. to switch between the current
> >> base16
> >> > and
> >> > >> > the
> >> > >> > > more common base10.   Then add another config to the serializer
> >> only
> >> > >> that
> >> > >> > > controls if decimals should be serialized as text or numeric.
> >> > >> >
> >> > >> > I think we need to be able to handle all mappings from
> >> serialization
> >> > >> format
> >> > >> > to deserialization format (e.g. read in BINARY and output TEXT),
> >> > which I
> >> > >> > think would be impossible with the alternative suggestion. I
> agree
> >> > that
> >> > >> > automatically deserializing numerics is valuable. I see two other
> >> ways
> >> > >> to
> >> > >> > get this, both keeping the serialization.format config the same:
> >> > >> >
> >> > >> > - have json.decimal.deserialization.format accept all three
> >> formats.
> >> > if
> >> > >> set
> >> > >> > to BINARY/TEXT, numerics would be automatically supported. If set
> >> to
> >> > >> > NUMERIC, then any string coming in would result in
> deserialization
> >> > error
> >> > >> > (defaults to BINARY for backwards compatibility)
> >> > >> > - change json.decimal.deserialization.format to
> >> > >> > json.decimal.deserialization.string.format which accepts only
> >> > >> BINARY/TEXT
> >> > >> > (defaults to BINARY for backwards compatibility)
> >> > >> >
> >> > >> > > would be a breaking change in that things that previously
> failed
> >> > would
> >> > >> > > suddenly start deserializing.  This is a price I'm willing to
> >> pay.
> >> > >> >
> >> > >> > I agree. I'm willing to pay this price too.
> >> > >> >
> >> > >> > > IMHO, we should then plan to switch the default of decimal
> >> > >> serialization
> >> > >> > to
> >> > >> > > numeric, and text serialization to base 10 in the next major
> >> > release.
> >> > >> >
> >> > >> > I think that can be a separate discussion, I don't want to block
> >> this
> >> > >> KIP
> >> > >> > on it.
> >> > >> >
> >> > >> > Thoughts?
> >> > >> >
> >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <o...@wikimedia.org>
> >> > wrote:
> >> > >> >
> >> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I use
> >> > >> JSONSchemas to
> >> > >> > > indicate whether a JSON number should be deserialized as an
> >> integer
> >> > >> or a
> >> > >> > > decimal
> >> > >> > > <
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> >> > >> > > >.
> >> > >> > > Not everyone is going to have JSONSchemas available when
> >> converting,
> >> > >> but
> >> > >> > if
> >> > >> > > you do, it is an easy way to support JSON numbers as decimals.
> >> > >> > >
> >> > >> > > Carry on! :)
> >> > >> > >
> >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <a...@confluent.io
> >
> >> > >> wrote:
> >> > >> > >
> >> > >> > > > Hi Almog,
> >> > >> > > >
> >> > >> > > > Like the KIP - I think being able to support decimals in JSON
> >> in
> >> > the
> >> > >> > same
> >> > >> > > > way most other systems do is a great improvement.
> >> > >> > > >
> >> > >> > > > It's not 100% clear to me from the KIP what the current
> format
> >> is.
> >> > >> Is
> >> > >> > > the
> >> > >> > > > text decimal a base16 encoded number, or is it base16 encoded
> >> > binary
> >> > >> > form
> >> > >> > > > of the number? (I've not tried to get my head around if these
> >> two
> >> > >> are
> >> > >> > > even
> >> > >> > > > different!)
> >> > >> > > >
> >> > >> > > > One suggestion I have is to change the proposed new config to
> >> only
> >> > >> > affect
> >> > >> > > > decimals stored as text, i.e. to switch between the current
> >> base16
> >> > >> and
> >> > >> > > the
> >> > >> > > > more common base10.   Then add another config to the
> serialzier
> >> > only
> >> > >> > that
> >> > >> > > > controls if decimals should be serialized as text or numeric.
> >> The
> >> > >> > > benefit
> >> > >> > > > of this approach is it allows us to enhance the deserializer
> to
> >> > >> > > > automatically handle numeric decimals even without any config
> >> > >> having to
> >> > >> > > be
> >> > >> > > > set, i.e. default config in the deserializer would be able to
> >> > handle
> >> > >> > > > numeric decimals.  Of course, this is a two edged sword: this
> >> > would
> >> > >> > make
> >> > >> > > > the deserializer work out of the box with numeric decimals,
> >> > (yay!),
> >> > >> but
> >> > >> > > > would be a breaking change in that things that previously
> >> failed
> >> > >> would
> >> > >> > > > suddenly start deserializing.  This is a price I'm willing to
> >> pay.
> >> > >> > > >
> >> > >> > > > IMHO, we should then plan to switch the default of decimal
> >> > >> > serialization
> >> > >> > > to
> >> > >> > > > numeric, and text serialization to base 10 in the next major
> >> > >> release.
> >> > >> > > > (With upgrade notes to match). Though I know this is more
> >> > >> contentious,
> >> > >> > I
> >> > >> > > > think it moves us forward in a much more standard way that
> the
> >> > >> current
> >> > >> > > > encoding of decimals.
> >> > >> > > >
> >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <
> al...@confluent.io>
> >> > >> wrote:
> >> > >> > > >
> >> > >> > > > > Hi Everyone!
> >> > >> > > > >
> >> > >> > > > > Kicking off discussion for a new KIP:
> >> > >> > > > >
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> >> > >> > > > >
> >> > >> > > > > For those who are interested, I have a prototype
> >> implementation
> >> > >> that
> >> > >> > > > helped
> >> > >> > > > > guide my design: https://github.com/agavra/kafka/pull/1
> >> > >> > > > >
> >> > >> > > > > Cheers,
> >> > >> > > > > Almog
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>

Reply via email to