Good catches! Fixed :)

On Thu, Aug 8, 2019 at 10:36 PM Arjun Satish <arjun.sat...@gmail.com> wrote:

> Cool!
>
> Couple of nits:
>
> - In public interfaces, typo: *json.decimal.serialization.fromat*
> - In public interfaces, you use the term "HEX" instead of "BASE64".
>
>
>
> On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <al...@confluent.io> wrote:
>
> > EDIT: everywhere I've been using "HEX" I meant to be using "BASE64". I
> will
> > update the KIP to reflect this.
> >
> > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <al...@confluent.io> wrote:
> >
> > > Thanks for the feedback Arjun! I'm happy changing the default config to
> > > HEX instead of BINARY, no strong feelings there.
> > >
> > > I'll also clarify the example in the KIP to be clearer:
> > >
> > > - serialize the decimal field "foo" with value "10.2345" with the HEX
> > > setting: {"foo": "D3J5"}
> > > - serialize the decimal field "foo" with value "10.2345" with the
> NUMERIC
> > > setting: {"foo": 10.2345}
> > >
> > > With regards to the precision issue, that was my original concern as
> well
> > > (and why I originally suggested a TEXT format). Many JSON deserializers
> > > (e.g. Jackson with DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS),
> > > however, have the ability to deserialize decimals correctly so I will
> > > configure that as the default for Connect's JsonDeserializer. It's
> > probably
> > > a good idea to call out that using other deserializers must be done
> with
> > > care - I will add that documentation to the serialization config.
> > >
> > > Note that there would not be an issue on the _serialization_ side of
> > > things as Jackson respects BigDecimal.
> > >
> > > Almog
> > >
> > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <arjun.sat...@gmail.com>
> > > wrote:
> > >
> > >> hey Almog, nice work! couple of thoughts (hope I'm not late since you
> > >> started the voting thread already):
> > >>
> > >> can you please add some examples to show the changes that you are
> > >> proposing. makes me think that for a given decimal number, we will
> have
> > >> two
> > >> encodings: “asHex” and “asNumber”.
> > >>
> > >> should we call the default config value “HEX” instead of “BINARY”?
> > >>
> > >> Should we call out the fact that JS systems might be susceptible to
> > double
> > >> precision round offs with the new numeric format? here are some people
> > >> discussing a similar problem
> > >> https://github.com/EventStore/EventStore/issues/1541
> > >>
> > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <al...@confluent.io>
> wrote:
> > >>
> > >> > Hello Everyone,
> > >> >
> > >> > Summarizing an in-person discussion with Randall (this is copied
> from
> > >> the
> > >> > KIP):
> > >> >
> > >> > The original KIP suggested supporting an additional representation -
> > >> base10
> > >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes issues
> because
> > >> it
> > >> > is impossible to disambiguate between TEXT and BINARY without an
> > >> additional
> > >> > config - furthermore, this makes the migration from one to the other
> > >> nearly
> > >> > impossible because it would require that all consumers stop
> consuming
> > >> and
> > >> > producers stop producing and atomically updating the config on all
> of
> > >> them
> > >> > after deploying the new code, or waiting for the full retention
> period
> > >> to
> > >> > pass - neither option is viable. The suggestion in the KIP is
> strictly
> > >> an
> > >> > improvement over the existing behavior, even if it doesn't support
> all
> > >> > combinations.
> > >> >
> > >> > It seems that since most real-world use cases actually use the
> numeric
> > >> > representation (not string) we can consider this an improvement.
> With
> > >> the
> > >> > new suggestion, we don't need a deserialization configuration (only
> a
> > >> > serialization option) and the consumers will be able to always
> > >> > automatically determine the serialization format.
> > >> >
> > >> > Based on this, I'll be opening up the simplified version of the KIP
> > to a
> > >> > vote.
> > >> >
> > >> > Almog
> > >> >
> > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <al...@confluent.io>
> > wrote:
> > >> >
> > >> > > I'm mostly happy with your current suggestion (two configs, one
> for
> > >> > > serialization and one for deserialization) and your implementation
> > >> > > suggestion. One thing to note:
> > >> > >
> > >> > > > We should _always_ be able to deserialize a standard JSON
> > >> > > > number as a decimal
> > >> > >
> > >> > > I was doing some research into decimals and JSON, and I can
> imagine
> > a
> > >> > > compelling reason to require string representations to avoid
> losing
> > >> > > precision and to be certain that whomever is sending the data
> isn't
> > >> > losing
> > >> > > precision (e.g. https://stackoverflow.com/a/38357877/2258040).
> > >> > >
> > >> > > I'm okay with always allowing numerics, but thought it's worth
> > raising
> > >> > the
> > >> > > thought.
> > >> > >
> > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <a...@confluent.io>
> > >> wrote:
> > >> > >
> > >> > >> The way I see it, we need to control two seperate things:
> > >> > >>
> > >> > >> 1. How do we _deserialize_ a decimal type if we encounter a text
> > >> node in
> > >> > >> the JSON?    (We should _always_ be able to deserialize a
> standard
> > >> JSON
> > >> > >> number as a decimal).
> > >> > >> 2. How do we chose how we want decimals to be _serialized_.
> > >> > >>
> > >> > >> This looks to fits well with your second suggestion of slightly
> > >> > different
> > >> > >> configs names for serialization vs deserialization.
> > >> > >> a, For deserialization we only care about how to handle text
> > nodes: `
> > >> > >> deserialization.decimal.*text*.format`, which should only have
> two
> > >> valid
> > >> > >> values BINARY | TEXT.
> > >> > >> b. For serialization we need all three:
> > >> `serialization.decimal.format`,
> > >> > >> which should support all three options: BINARY | TEXT | NUMERIC.
> > >> > >>
> > >> > >> Implementation wise, I think these should be two separate enums,
> > >> rather
> > >> > >> than one shared enum and throwing an error if the deserializer is
> > >> set to
> > >> > >> NUMERIC.  Mainly as this means the enums reflect the options
> > >> available,
> > >> > >> rather than this being hidden in config checking code.  But
> that's
> > a
> > >> > minor
> > >> > >> implementation detail.
> > >> > >>
> > >> > >> Personally, I'd be tempted to have the BINARY value named
> something
> > >> like
> > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging users to move
> > >> away
> > >> > >> from
> > >> > >> it.
> > >> > >>
> > >> > >> It's a real shame that both of these settings require a default
> of
> > >> > BINARY
> > >> > >> for backwards compatibility, but I agree that discussions / plans
> > >> around
> > >> > >> switching the defaults should not block this KIP.
> > >> > >>
> > >> > >> Andy
> > >> > >>
> > >> > >>
> > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <al...@confluent.io>
> > >> wrote:
> > >> > >>
> > >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)!
> > >> > >> >
> > >> > >> > > Is the text decimal a base16 encoded number, or is it base16
> > >> encoded
> > >> > >> > binary
> > >> > >> > > form of the number?
> > >> > >> >
> > >> > >> > The conversion happens as decimal.unscaledValue().toByteArray()
> > and
> > >> > then
> > >> > >> > the byte array is converted to a hex string, so it's definitely
> > the
> > >> > >> binary
> > >> > >> > form of the number converted to base16. Whether or not that's
> the
> > >> same
> > >> > >> as
> > >> > >> > the base16 encoded number is a good question (toByteArray
> > returns a
> > >> > byte
> > >> > >> > array containing a signed, big-endian, two's complement
> > >> representation
> > >> > >> of
> > >> > >> > the big integer).
> > >> > >> >
> > >> > >> > > One suggestion I have is to change the proposed new config to
> > >> only
> > >> > >> affect
> > >> > >> > > decimals stored as text, i.e. to switch between the current
> > >> base16
> > >> > and
> > >> > >> > the
> > >> > >> > > more common base10.   Then add another config to the
> serializer
> > >> only
> > >> > >> that
> > >> > >> > > controls if decimals should be serialized as text or numeric.
> > >> > >> >
> > >> > >> > I think we need to be able to handle all mappings from
> > >> serialization
> > >> > >> format
> > >> > >> > to deserialization format (e.g. read in BINARY and output
> TEXT),
> > >> > which I
> > >> > >> > think would be impossible with the alternative suggestion. I
> > agree
> > >> > that
> > >> > >> > automatically deserializing numerics is valuable. I see two
> other
> > >> ways
> > >> > >> to
> > >> > >> > get this, both keeping the serialization.format config the
> same:
> > >> > >> >
> > >> > >> > - have json.decimal.deserialization.format accept all three
> > >> formats.
> > >> > if
> > >> > >> set
> > >> > >> > to BINARY/TEXT, numerics would be automatically supported. If
> set
> > >> to
> > >> > >> > NUMERIC, then any string coming in would result in
> > deserialization
> > >> > error
> > >> > >> > (defaults to BINARY for backwards compatibility)
> > >> > >> > - change json.decimal.deserialization.format to
> > >> > >> > json.decimal.deserialization.string.format which accepts only
> > >> > >> BINARY/TEXT
> > >> > >> > (defaults to BINARY for backwards compatibility)
> > >> > >> >
> > >> > >> > > would be a breaking change in that things that previously
> > failed
> > >> > would
> > >> > >> > > suddenly start deserializing.  This is a price I'm willing to
> > >> pay.
> > >> > >> >
> > >> > >> > I agree. I'm willing to pay this price too.
> > >> > >> >
> > >> > >> > > IMHO, we should then plan to switch the default of decimal
> > >> > >> serialization
> > >> > >> > to
> > >> > >> > > numeric, and text serialization to base 10 in the next major
> > >> > release.
> > >> > >> >
> > >> > >> > I think that can be a separate discussion, I don't want to
> block
> > >> this
> > >> > >> KIP
> > >> > >> > on it.
> > >> > >> >
> > >> > >> > Thoughts?
> > >> > >> >
> > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <
> o...@wikimedia.org>
> > >> > wrote:
> > >> > >> >
> > >> > >> > > This is a bit orthogonal, but in JsonSchemaConverter I use
> > >> > >> JSONSchemas to
> > >> > >> > > indicate whether a JSON number should be deserialized as an
> > >> integer
> > >> > >> or a
> > >> > >> > > decimal
> > >> > >> > > <
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> > >> > >> > > >.
> > >> > >> > > Not everyone is going to have JSONSchemas available when
> > >> converting,
> > >> > >> but
> > >> > >> > if
> > >> > >> > > you do, it is an easy way to support JSON numbers as
> decimals.
> > >> > >> > >
> > >> > >> > > Carry on! :)
> > >> > >> > >
> > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <
> a...@confluent.io
> > >
> > >> > >> wrote:
> > >> > >> > >
> > >> > >> > > > Hi Almog,
> > >> > >> > > >
> > >> > >> > > > Like the KIP - I think being able to support decimals in
> JSON
> > >> in
> > >> > the
> > >> > >> > same
> > >> > >> > > > way most other systems do is a great improvement.
> > >> > >> > > >
> > >> > >> > > > It's not 100% clear to me from the KIP what the current
> > format
> > >> is.
> > >> > >> Is
> > >> > >> > > the
> > >> > >> > > > text decimal a base16 encoded number, or is it base16
> encoded
> > >> > binary
> > >> > >> > form
> > >> > >> > > > of the number? (I've not tried to get my head around if
> these
> > >> two
> > >> > >> are
> > >> > >> > > even
> > >> > >> > > > different!)
> > >> > >> > > >
> > >> > >> > > > One suggestion I have is to change the proposed new config
> to
> > >> only
> > >> > >> > affect
> > >> > >> > > > decimals stored as text, i.e. to switch between the current
> > >> base16
> > >> > >> and
> > >> > >> > > the
> > >> > >> > > > more common base10.   Then add another config to the
> > serialzier
> > >> > only
> > >> > >> > that
> > >> > >> > > > controls if decimals should be serialized as text or
> numeric.
> > >> The
> > >> > >> > > benefit
> > >> > >> > > > of this approach is it allows us to enhance the
> deserializer
> > to
> > >> > >> > > > automatically handle numeric decimals even without any
> config
> > >> > >> having to
> > >> > >> > > be
> > >> > >> > > > set, i.e. default config in the deserializer would be able
> to
> > >> > handle
> > >> > >> > > > numeric decimals.  Of course, this is a two edged sword:
> this
> > >> > would
> > >> > >> > make
> > >> > >> > > > the deserializer work out of the box with numeric decimals,
> > >> > (yay!),
> > >> > >> but
> > >> > >> > > > would be a breaking change in that things that previously
> > >> failed
> > >> > >> would
> > >> > >> > > > suddenly start deserializing.  This is a price I'm willing
> to
> > >> pay.
> > >> > >> > > >
> > >> > >> > > > IMHO, we should then plan to switch the default of decimal
> > >> > >> > serialization
> > >> > >> > > to
> > >> > >> > > > numeric, and text serialization to base 10 in the next
> major
> > >> > >> release.
> > >> > >> > > > (With upgrade notes to match). Though I know this is more
> > >> > >> contentious,
> > >> > >> > I
> > >> > >> > > > think it moves us forward in a much more standard way that
> > the
> > >> > >> current
> > >> > >> > > > encoding of decimals.
> > >> > >> > > >
> > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <
> > al...@confluent.io>
> > >> > >> wrote:
> > >> > >> > > >
> > >> > >> > > > > Hi Everyone!
> > >> > >> > > > >
> > >> > >> > > > > Kicking off discussion for a new KIP:
> > >> > >> > > > >
> > >> > >> > > > >
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > >> > >> > > > >
> > >> > >> > > > > For those who are interested, I have a prototype
> > >> implementation
> > >> > >> that
> > >> > >> > > > helped
> > >> > >> > > > > guide my design: https://github.com/agavra/kafka/pull/1
> > >> > >> > > > >
> > >> > >> > > > > Cheers,
> > >> > >> > > > > Almog
> > >> > >> > > > >
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to