Thanks for all the changes, Almog. The current KIP looks good to me.

Randall

On Fri, Aug 30, 2019 at 11:28 AM Almog Gavra <al...@confluent.io> wrote:

> Thanks again Randall! Here are the changes I made:
>
> - Defaults. The KIP had mentioned that the default would be BASE64 in the
> "Public Interfaces" section. I have also added your suggestion in "Proposed
> Changes".
> - I've added a bullet point to change the internal converter to use
> decimal.format=NUMERIC in Proposed Changes. If I see that this is not
> possible or cumbersome during implementation I will amend the KIP to keep
> this change minimally scoped.
> - Adopted your third suggestion
> - Adopted your migration plan proposal
> - Added rejected alternative for creating a new converter
>
> > calling `NumericNode.decimalValue()` will always return a
> java.math.BigDecimal even
> > if the underlying Number was not a BigDecimal
>
> I'm not sure I understand your comment here. NumericNode#decimalValue will
> always return a BigDecimal, regardless of what the underlying JSON data is
> - that hasn't changed with this KIP. The decimalValue() will only be called
> when converting to a DECIMAL logical type.
>
> What has changed, is that the NumericNode will store a BigDecimal instead
> of a double whenever the JSON value is a number with a decimal point in it
> (this should answer your fourth point, about deserializing a BigDecimal for
> all floating points).
>
> > Shouldn't the `convertToConnect(Schema, JsonNode) instead use the type of
> Number value as
> > parsed and returned by the JsonDeserializer to determine the proper
> schema
> > type
>
> There is no way to infer the "proper schema type" in the JsonDeserializer.
> If my JSON value is {"foo": 1.234} I have no idea whether "foo" is a
> decimal or a double - that's the reason we need a configuration value in
> first place. This means that in order to avoid precision loss, we must
> deserialize any floating point number first as a BigDecimal.
>
> > and then get the proper value type using that schema type's converter?
>
> That's exactly the proposal. I think this will be clear in the code.
>
> Almog
>
> On Fri, Aug 30, 2019 at 8:00 AM Randall Hauch <rha...@gmail.com> wrote:
>
> > Thanks for the updates, Almog. This looks really good, but I have a few
> > more comments (most wording, though one potentially thorny issue):
> >
> > First, the KIP should define the default value for the `decimal.format`
> > property. IIUC, it will be BASE64, and to gain the new behavior users
> will
> > have to explicitly set this to NUMERIC. In fact, I'd recommend changing
> > this bullet item from:
> >
> >    - JsonConverter will accept the decimal.format configuration to
> >    determine the serialization format. If the value is BASE64, the
> behavior
> >    remains unchanged (i.e. it serializes it as a base64 text). If the
> > value is
> >    NUMERIC, the JSON node will be a number representing that decimal
> (e.g.
> >    10.2345 instead of "D3J5").
> >
> > to something like:
> >
> >    - Define a new `decimal.format` configuration property on
> JsonConverter
> >    to specify the serialization format for Connect DECIMAL logical type
> >    values, with two allowed literal values for the configuration
> property:
> >    - `BASE64` specifies the existing behavior of serializing DECIMAL
> >       logical types as base64 encoded binary data (e.g., "D3J5" in the
> > example
> >       above), and will be the default; and
> >       - `NUMERIC` will serialize Connect DECIMAL logical type values in
> >       JSON as a number representing that decimal
> >
> > Second, since the default will be the existing BASE64 representation,
> what
> > do you think about changing the JsonConverter instances used by the
> Connect
> > worker for internal topics to enable `decimal.format=NUMERIC`? I don't
> > think we actually use decimals in the internal messages, but if we do at
> > some point then the converters will store them with the improved natural
> > representation.
> >
> > Third, the following bullet item could be more clear:
> >
> >    - JsonConverter will automatically handle deserialization of either
> >    serialization format given a Decimal logical type schema, i.e. it will
> >    accept both a deserialized BinaryNode and NumericNode. If the value
> is a
> >    BinaryNode, it will construct a java BigDecimal from the binaryValue()
> >    (which is a btye[]). If the value is a NumericNode, it will simply
> pass
> >    through the decimalValue() deserialized by the JsonDeserializer.
> >
> > such as maybe:
> >
> >    - The JsonConverter deserialization method currently expects only a
> >    BinaryNode, but will be changed to also handle NumericNode by calling
> >    NumericNode.decimalValue().
> >
> > This brings up an interesting potential issue: if `schemas.enable=false`,
> > then there will be no schema in the record, and calling
> > `NumericNode.decimalValue()` will always return a java.math.BigDecimal
> even
> > if the underlying Number was not a BigDecimal. Shouldn't the
> > `convertToConnect(Schema, JsonNode) instead use the type of Number value
> as
> > parsed and returned by the JsonDeserializer to determine the proper
> schema
> > type, and then get the proper value type using that schema type's
> > converter?
> >
> > Fourth, I'm not sure I understand the following bullet:
> >
> >    - JsonDeserializer will now default floating point deserialization to
> >    BigDecimal to avoid losing precision. This may impact performance when
> >    deserializing doubles - a JMH microbenchmark on my local MBP, this
> >    estimated about 3x degradation for deserializing JSON floating points.
> > If
> >    the connect schema is not the decimal logical type, the JsonConverter
> > will
> >    convert this BigDecimal value into the corresponding floating point
> java
> >    object.
> >
> > Fifth, I think the migration plan is not quite accurate. Step 3 mentions
> > changing the Connect worker config's key and value converters to use the
> > new setting, and a restart (step 4) is necessary after this step. Perhaps
> > step 3 should be "If the Connect worker uses the JsonConverter for the
> key
> > and/or value converters, optionally set the `decimal.format=NUMERIC` for
> > the key and/or value converter and restart the workers." followed by Step
> > 4: "If desired, update any source connector configs that use the
> > JsonConverter for key and/or value converters to use
> > `decimal.format=NUMERIC`."
> >
> > Finally, should we add a short discussion in the Rejected Alternatives
> > about the option of leaving JsonConverter untouched and creating a
> > different converter implementation?
> >
> > Thanks!
> >
> > Randall
> >
> >
> >
> > On Mon, Aug 26, 2019 at 11:51 AM Almog Gavra <al...@confluent.io> wrote:
> >
> > > Thanks for the feedback Randall! I have updated the KIP with the
> > following
> > > edits:
> > >
> > > * Updated the reference from "producer" to "source" (I had missed that
> > > one!)
> > > * Changed the config from "json.decimal.serialization.format" to
> > > "decimal.format"
> > > * Clarified case sensitivity
> > > * Clarified the proposed changes to note that deserialization is not
> > > affected by the config
> > > * Clarified the changes in JsonConverter to handle deserialization (see
> > my
> > > third bullet below)
> > > * Added a clear migration plan and simplified compatibility
> > >
> > > Here are also some clarifications based on your comments.
> > >
> > > * I think "json" has limited value in the configuration name. If put
> in a
> > > top-level worker config, it clarifies that it only affects connectors
> > using
> > > the JsonConverter. I have opted for your suggestion and dropped it.
> > > * I think "serialization" has limited value in the configuration name.
> If
> > > we ever want to introduce "deserialization" configurations, there will
> be
> > > asymmetry in the configuration names. I have opted for your suggestion
> > and
> > > dropped it.
> > > * The JsonConverter will not "always look for numbers". The converter
> > will
> > > receive from the Jackson Object Mapper either a NumericNode containing
> a
> > > big decimal or a BinaryNode containing a btye[]. Based on the type of
> > this
> > > node, it will convert the value to a BigDecimal appropriately (or any
> > other
> > > Connect java type based on the schema).
> > > * "the ... JsonDeserializer are not affected" is not exactly true, but
> > > semantically correct. See the note in the KIP about defaulting floating
> > > points to BigDecimal to avoid precision loss.
> > > * "The resulting application, however, may need to handle a wider range
> > of
> > > numeric values." Unless I misunderstand what you're saying, I don't
> think
> > > this is correct. The resulting application will still receive exactly
> the
> > > same Connect data object from the JsonConverter as it was before - only
> > the
> > > SerDe layer is affected.
> > >
> > > Cheers,
> > > Almog
> > >
> > > On Sun, Aug 25, 2019 at 4:28 PM Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > Thanks for all the work, Almog.
> > > >
> > > > For the most part, I think this KIP will be a great improvement, and
> > IMO
> > > is
> > > > almost ready to go. However, I do have a few suggestions that affect
> > the
> > > > wording more than the intent.
> > > >
> > > > First, the name of the `json.decimal.serialization.format` property
> is
> > > > pretty long, especially when it is prefixed in the Worker config or
> in
> > a
> > > > connector config as `key.converer.json.decimal.serialization.format`
> or
> > > > `value.converter.json.decimal.serialization.format` . Have you
> > > considered a
> > > > shorter config property name, such as maybe `decimal.format`? Is
> there
> > > any
> > > > benefit to include "json" and "serialization" in the property name?
> > Also,
> > > > we should be clear that the value will not be case sensitive (e.g.,
> > > > "numeric" and "NUMERIC" would be equivalent), to keep in alignment
> with
> > > > other enumeration literals in Connect configurations. The goal should
> > be
> > > > simple
> > > >
> > > > Second, the Motivation section, has the following sentence:
> > > >
> > > > "A new configuration for producers json.decimal.serialization.format
> > will
> > > > be introduced to the JsonConverter configuration to help control
> > whether
> > > > source converters will serialize decimals in numeric or binary
> > formats."
> > > >
> > > >
> > > > I agree with an earlier comment from Konstantine that "producers"
> here
> > is
> > > > distracting and does not mirror the normal definition of "producers"
> > > within
> > > > the Kafka context. I suggest rephrasing this to something like
> > > >
> > > > "Introduce to the JsonConverter a new configuration property named
> > > > json.decimal.serialization.format to control whether source
> converters
> > > will
> > > > serialize decimals in numeric or binary formats."
> > > >
> > > >
> > > > Third, the KIP should be more clear about whether the
> > > > `json.decimal.serialization.format` setting does or does not affect
> > > > deserialization? IIUC, the deserialization logic will always look for
> > > JSON
> > > > numbers, and will always use the Schema to define whether it should
> > > convert
> > > > the value to a different number type. Is that a fair statement?
> > > >
> > > > Fourth, the JsonSerializer and JsonDeserializer are not affected, yet
> > are
> > > > still compatible with the old and new behavior. Because the primary
> > > purpose
> > > > of this new setting is to define how Connect DECIMAL logical type
> > values
> > > > are serialized in JSON documents, the JsonDeserializer will still be
> > able
> > > > to deserialize the JSON document correctly. The resulting
> application,
> > > > however, may need to handle a wider range of numeric values.
> > > >
> > > > Fifth, the Compatibility section seems more complicated than perhaps
> it
> > > > needs to be, maybe because it seems to distinguish between upgrading
> > and
> > > > setting the decimal serialization format. Maybe it would be
> sufficient
> > to
> > > > simply emphasize that all of the sink connectors (or consumer
> > > applications)
> > > > using the JsonConverter with
> > > > the `json.decimal.serialization.format=NUMERIC` setting consuming
> > records
> > > > from a set of topics be upgraded and changed *before* any of the
> source
> > > > connectors (or other producer applications) using the JsonConverter
> to
> > > > serialize records are changed to use
> > > > the `json.decimal.serialization.format=NUMERIC` setting? It may also
> > > > warrant giving more concrete advice on upgrade procedures. For
> example,
> > > how
> > > > does a user upgrade a set of Connect workers to use this new
> property?
> > Do
> > > > they upgrade first and restart to ensure everything runs as-is, and
> > then
> > > > upgrade their source connectors to set
> > > > `json.decimal.serialization.format=NUMERIC`via connector
> configurations
> > > or
> > > > worker configs?
> > > >
> > > > Anyway, great job so far!
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > > > On Thu, Aug 15, 2019 at 12:00 PM Konstantine Karantasis <
> > > > konstant...@confluent.io> wrote:
> > > >
> > > > > Thanks! KIP reads even better for me now.
> > > > > Just voted. +1 non-binding
> > > > >
> > > > > Konstantine
> > > > >
> > > > > On Wed, Aug 14, 2019 at 7:00 PM Almog Gavra <al...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Thanks for the review Konstantine!
> > > > > >
> > > > > > I think the terminology suggestion definitely makes things
> clearer
> > -
> > > I
> > > > > will
> > > > > > update the documentation based on your suggestion (e.g.
> > > s/Consumer/Sink
> > > > > > Converter/g and s/Producer/Source Converter/g).
> > > > > >
> > > > > > Cheers,
> > > > > > Almog
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 8:13 AM Konstantine Karantasis <
> > > > > > konstant...@confluent.io> wrote:
> > > > > >
> > > > > > > Thanks Almog for preparing this KIP!
> > > > > > > I think it will improve usability and troubleshooting with JSON
> > > data
> > > > a
> > > > > > lot.
> > > > > > >
> > > > > > > The finalized plan seems quite concrete now. I also liked that
> > some
> > > > > > > implementation specific implications (such as setting the
> > > > ObjectMapper
> > > > > to
> > > > > > > deserialize floating point as BigDecimal) are highlighted in
> the
> > > KIP.
> > > > > > >
> > > > > > > Still, as I was reading the KIP, the main obstacle I
> encountered
> > > was
> > > > > > around
> > > > > > > terminology. I couldn't get used to reading "producer" and
> > > "consumer"
> > > > > and
> > > > > > > not thinking in terms of Kafka producers and consumers - which
> > are
> > > > not
> > > > > > > relevant to what this KIP proposes. Thus, I'd suggest replacing
> > > > > > > "Producer(s)" with "Source Converter(s)" and "Consumer(s)" with
> > > "Sink
> > > > > > > Converter(s)" (even if "Converter used by Source Connector" or
> > > > > "Converter
> > > > > > > used by Sink Connector" would be even more accurate - maybe
> this
> > > > could
> > > > > be
> > > > > > > an explanation in a footnote). Terminology around converters
> has
> > > been
> > > > > > > tricky in the past and adding producers/consumers in the mix
> > might
> > > > add
> > > > > to
> > > > > > > the confusion.
> > > > > > >
> > > > > > > Another example where I'd apply this different terminology
> would
> > be
> > > > to
> > > > > a
> > > > > > > phrase such as the following:
> > > > > > > "Because of this, users must take care to first ensure that all
> > > > > consumers
> > > > > > > have upgraded to the new code before upgrading producers to
> make
> > > use
> > > > of
> > > > > > the
> > > > > > > NUMERIC serialization format."
> > > > > > > which I'd write
> > > > > > > "Because of this, users must take care to first ensure that all
> > > sink
> > > > > > > connectors have upgraded to the new converter code before
> > upgrading
> > > > > > source
> > > > > > > connectors to make use of the NUMERIC serialization format in
> > > > > > > JsonConverter."
> > > > > > >
> > > > > > > Let me know if you think this suggestion makes the KIP easier
> to
> > > > > follow.
> > > > > > > Otherwise I think it's a solid proposal.
> > > > > > >
> > > > > > > I'm concluding with a couple of nits:
> > > > > > >
> > > > > > > - "Upgraded Producer with BASE64 serialization, Legacy
> Consumer:
> > > this
> > > > > > > scenario is okay as the upgraded ~producer~ consumer will be
> able
> > > to
> > > > > read
> > > > > > > binary as today" (again according to my suggestion above, it
> > could
> > > be
> > > > > as
> > > > > > > the upgraded source converter ...)
> > > > > > >
> > > > > > > - "consumers cannot consumer NUMERIC data. " -> "consumers
> cannot
> > > > read
> > > > > > > NUMERIC data"
> > > > > > >
> > > > > > > Best,
> > > > > > > Konstantine
> > > > > > >
> > > > > > > On Fri, Aug 9, 2019 at 6:37 PM Almog Gavra <al...@confluent.io
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Good catches! Fixed :)
> > > > > > > >
> > > > > > > > On Thu, Aug 8, 2019 at 10:36 PM Arjun Satish <
> > > > arjun.sat...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Cool!
> > > > > > > > >
> > > > > > > > > Couple of nits:
> > > > > > > > >
> > > > > > > > > - In public interfaces, typo:
> > > *json.decimal.serialization.fromat*
> > > > > > > > > - In public interfaces, you use the term "HEX" instead of
> > > > "BASE64".
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <
> > al...@confluent.io
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > EDIT: everywhere I've been using "HEX" I meant to be
> using
> > > > > > "BASE64".
> > > > > > > I
> > > > > > > > > will
> > > > > > > > > > update the KIP to reflect this.
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <
> > > al...@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for the feedback Arjun! I'm happy changing the
> > > default
> > > > > > > config
> > > > > > > > to
> > > > > > > > > > > HEX instead of BINARY, no strong feelings there.
> > > > > > > > > > >
> > > > > > > > > > > I'll also clarify the example in the KIP to be clearer:
> > > > > > > > > > >
> > > > > > > > > > > - serialize the decimal field "foo" with value
> "10.2345"
> > > with
> > > > > the
> > > > > > > HEX
> > > > > > > > > > > setting: {"foo": "D3J5"}
> > > > > > > > > > > - serialize the decimal field "foo" with value
> "10.2345"
> > > with
> > > > > the
> > > > > > > > > NUMERIC
> > > > > > > > > > > setting: {"foo": 10.2345}
> > > > > > > > > > >
> > > > > > > > > > > With regards to the precision issue, that was my
> original
> > > > > concern
> > > > > > > as
> > > > > > > > > well
> > > > > > > > > > > (and why I originally suggested a TEXT format). Many
> JSON
> > > > > > > > deserializers
> > > > > > > > > > > (e.g. Jackson with
> > > > > > > > DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS),
> > > > > > > > > > > however, have the ability to deserialize decimals
> > correctly
> > > > so
> > > > > I
> > > > > > > will
> > > > > > > > > > > configure that as the default for Connect's
> > > JsonDeserializer.
> > > > > > It's
> > > > > > > > > > probably
> > > > > > > > > > > a good idea to call out that using other deserializers
> > must
> > > > be
> > > > > > done
> > > > > > > > > with
> > > > > > > > > > > care - I will add that documentation to the
> serialization
> > > > > config.
> > > > > > > > > > >
> > > > > > > > > > > Note that there would not be an issue on the
> > > _serialization_
> > > > > side
> > > > > > > of
> > > > > > > > > > > things as Jackson respects BigDecimal.
> > > > > > > > > > >
> > > > > > > > > > > Almog
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <
> > > > > > > arjun.sat...@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> hey Almog, nice work! couple of thoughts (hope I'm not
> > > late
> > > > > > since
> > > > > > > > you
> > > > > > > > > > >> started the voting thread already):
> > > > > > > > > > >>
> > > > > > > > > > >> can you please add some examples to show the changes
> > that
> > > > you
> > > > > > are
> > > > > > > > > > >> proposing. makes me think that for a given decimal
> > number,
> > > > we
> > > > > > will
> > > > > > > > > have
> > > > > > > > > > >> two
> > > > > > > > > > >> encodings: “asHex” and “asNumber”.
> > > > > > > > > > >>
> > > > > > > > > > >> should we call the default config value “HEX” instead
> of
> > > > > > “BINARY”?
> > > > > > > > > > >>
> > > > > > > > > > >> Should we call out the fact that JS systems might be
> > > > > susceptible
> > > > > > > to
> > > > > > > > > > double
> > > > > > > > > > >> precision round offs with the new numeric format? here
> > are
> > > > > some
> > > > > > > > people
> > > > > > > > > > >> discussing a similar problem
> > > > > > > > > > >> https://github.com/EventStore/EventStore/issues/1541
> > > > > > > > > > >>
> > > > > > > > > > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <
> > > > > al...@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Hello Everyone,
> > > > > > > > > > >> >
> > > > > > > > > > >> > Summarizing an in-person discussion with Randall
> (this
> > > is
> > > > > > copied
> > > > > > > > > from
> > > > > > > > > > >> the
> > > > > > > > > > >> > KIP):
> > > > > > > > > > >> >
> > > > > > > > > > >> > The original KIP suggested supporting an additional
> > > > > > > > representation -
> > > > > > > > > > >> base10
> > > > > > > > > > >> > encoded text (e.g. `{"asText":"10.2345"}`). This
> > causes
> > > > > issues
> > > > > > > > > because
> > > > > > > > > > >> it
> > > > > > > > > > >> > is impossible to disambiguate between TEXT and
> BINARY
> > > > > without
> > > > > > an
> > > > > > > > > > >> additional
> > > > > > > > > > >> > config - furthermore, this makes the migration from
> > one
> > > to
> > > > > the
> > > > > > > > other
> > > > > > > > > > >> nearly
> > > > > > > > > > >> > impossible because it would require that all
> consumers
> > > > stop
> > > > > > > > > consuming
> > > > > > > > > > >> and
> > > > > > > > > > >> > producers stop producing and atomically updating the
> > > > config
> > > > > on
> > > > > > > all
> > > > > > > > > of
> > > > > > > > > > >> them
> > > > > > > > > > >> > after deploying the new code, or waiting for the
> full
> > > > > > retention
> > > > > > > > > period
> > > > > > > > > > >> to
> > > > > > > > > > >> > pass - neither option is viable. The suggestion in
> the
> > > KIP
> > > > > is
> > > > > > > > > strictly
> > > > > > > > > > >> an
> > > > > > > > > > >> > improvement over the existing behavior, even if it
> > > doesn't
> > > > > > > support
> > > > > > > > > all
> > > > > > > > > > >> > combinations.
> > > > > > > > > > >> >
> > > > > > > > > > >> > It seems that since most real-world use cases
> actually
> > > use
> > > > > the
> > > > > > > > > numeric
> > > > > > > > > > >> > representation (not string) we can consider this an
> > > > > > improvement.
> > > > > > > > > With
> > > > > > > > > > >> the
> > > > > > > > > > >> > new suggestion, we don't need a deserialization
> > > > > configuration
> > > > > > > > (only
> > > > > > > > > a
> > > > > > > > > > >> > serialization option) and the consumers will be able
> > to
> > > > > always
> > > > > > > > > > >> > automatically determine the serialization format.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Based on this, I'll be opening up the simplified
> > version
> > > > of
> > > > > > the
> > > > > > > > KIP
> > > > > > > > > > to a
> > > > > > > > > > >> > vote.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Almog
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <
> > > > > > al...@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > I'm mostly happy with your current suggestion (two
> > > > > configs,
> > > > > > > one
> > > > > > > > > for
> > > > > > > > > > >> > > serialization and one for deserialization) and
> your
> > > > > > > > implementation
> > > > > > > > > > >> > > suggestion. One thing to note:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > We should _always_ be able to deserialize a
> > standard
> > > > > JSON
> > > > > > > > > > >> > > > number as a decimal
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I was doing some research into decimals and JSON,
> > and
> > > I
> > > > > can
> > > > > > > > > imagine
> > > > > > > > > > a
> > > > > > > > > > >> > > compelling reason to require string
> representations
> > to
> > > > > avoid
> > > > > > > > > losing
> > > > > > > > > > >> > > precision and to be certain that whomever is
> sending
> > > the
> > > > > > data
> > > > > > > > > isn't
> > > > > > > > > > >> > losing
> > > > > > > > > > >> > > precision (e.g.
> > > > > > https://stackoverflow.com/a/38357877/2258040
> > > > > > > ).
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I'm okay with always allowing numerics, but
> thought
> > > it's
> > > > > > worth
> > > > > > > > > > raising
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > thought.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <
> > > > > > > a...@confluent.io>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >> The way I see it, we need to control two seperate
> > > > things:
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> 1. How do we _deserialize_ a decimal type if we
> > > > > encounter a
> > > > > > > > text
> > > > > > > > > > >> node in
> > > > > > > > > > >> > >> the JSON?    (We should _always_ be able to
> > > > deserialize a
> > > > > > > > > standard
> > > > > > > > > > >> JSON
> > > > > > > > > > >> > >> number as a decimal).
> > > > > > > > > > >> > >> 2. How do we chose how we want decimals to be
> > > > > _serialized_.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> This looks to fits well with your second
> suggestion
> > > of
> > > > > > > slightly
> > > > > > > > > > >> > different
> > > > > > > > > > >> > >> configs names for serialization vs
> deserialization.
> > > > > > > > > > >> > >> a, For deserialization we only care about how to
> > > handle
> > > > > > text
> > > > > > > > > > nodes: `
> > > > > > > > > > >> > >> deserialization.decimal.*text*.format`, which
> > should
> > > > only
> > > > > > > have
> > > > > > > > > two
> > > > > > > > > > >> valid
> > > > > > > > > > >> > >> values BINARY | TEXT.
> > > > > > > > > > >> > >> b. For serialization we need all three:
> > > > > > > > > > >> `serialization.decimal.format`,
> > > > > > > > > > >> > >> which should support all three options: BINARY |
> > > TEXT |
> > > > > > > > NUMERIC.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Implementation wise, I think these should be two
> > > > separate
> > > > > > > > enums,
> > > > > > > > > > >> rather
> > > > > > > > > > >> > >> than one shared enum and throwing an error if the
> > > > > > > deserializer
> > > > > > > > is
> > > > > > > > > > >> set to
> > > > > > > > > > >> > >> NUMERIC.  Mainly as this means the enums reflect
> > the
> > > > > > options
> > > > > > > > > > >> available,
> > > > > > > > > > >> > >> rather than this being hidden in config checking
> > > code.
> > > > > But
> > > > > > > > > that's
> > > > > > > > > > a
> > > > > > > > > > >> > minor
> > > > > > > > > > >> > >> implementation detail.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Personally, I'd be tempted to have the BINARY
> value
> > > > named
> > > > > > > > > something
> > > > > > > > > > >> like
> > > > > > > > > > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of
> encouraging
> > > > users
> > > > > > to
> > > > > > > > move
> > > > > > > > > > >> away
> > > > > > > > > > >> > >> from
> > > > > > > > > > >> > >> it.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> It's a real shame that both of these settings
> > > require a
> > > > > > > default
> > > > > > > > > of
> > > > > > > > > > >> > BINARY
> > > > > > > > > > >> > >> for backwards compatibility, but I agree that
> > > > > discussions /
> > > > > > > > plans
> > > > > > > > > > >> around
> > > > > > > > > > >> > >> switching the defaults should not block this KIP.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Andy
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <
> > > > > > > al...@confluent.io>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> > Thanks for the replies Andy and Andrew (2x
> > Andy?)!
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > > Is the text decimal a base16 encoded number,
> or
> > > is
> > > > it
> > > > > > > > base16
> > > > > > > > > > >> encoded
> > > > > > > > > > >> > >> > binary
> > > > > > > > > > >> > >> > > form of the number?
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > The conversion happens as
> > > > > > > > decimal.unscaledValue().toByteArray()
> > > > > > > > > > and
> > > > > > > > > > >> > then
> > > > > > > > > > >> > >> > the byte array is converted to a hex string, so
> > > it's
> > > > > > > > definitely
> > > > > > > > > > the
> > > > > > > > > > >> > >> binary
> > > > > > > > > > >> > >> > form of the number converted to base16. Whether
> > or
> > > > not
> > > > > > > that's
> > > > > > > > > the
> > > > > > > > > > >> same
> > > > > > > > > > >> > >> as
> > > > > > > > > > >> > >> > the base16 encoded number is a good question
> > > > > (toByteArray
> > > > > > > > > > returns a
> > > > > > > > > > >> > byte
> > > > > > > > > > >> > >> > array containing a signed, big-endian, two's
> > > > complement
> > > > > > > > > > >> representation
> > > > > > > > > > >> > >> of
> > > > > > > > > > >> > >> > the big integer).
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > > One suggestion I have is to change the
> proposed
> > > new
> > > > > > > config
> > > > > > > > to
> > > > > > > > > > >> only
> > > > > > > > > > >> > >> affect
> > > > > > > > > > >> > >> > > decimals stored as text, i.e. to switch
> between
> > > the
> > > > > > > current
> > > > > > > > > > >> base16
> > > > > > > > > > >> > and
> > > > > > > > > > >> > >> > the
> > > > > > > > > > >> > >> > > more common base10.   Then add another config
> > to
> > > > the
> > > > > > > > > serializer
> > > > > > > > > > >> only
> > > > > > > > > > >> > >> that
> > > > > > > > > > >> > >> > > controls if decimals should be serialized as
> > text
> > > > or
> > > > > > > > numeric.
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > I think we need to be able to handle all
> mappings
> > > > from
> > > > > > > > > > >> serialization
> > > > > > > > > > >> > >> format
> > > > > > > > > > >> > >> > to deserialization format (e.g. read in BINARY
> > and
> > > > > output
> > > > > > > > > TEXT),
> > > > > > > > > > >> > which I
> > > > > > > > > > >> > >> > think would be impossible with the alternative
> > > > > > suggestion.
> > > > > > > I
> > > > > > > > > > agree
> > > > > > > > > > >> > that
> > > > > > > > > > >> > >> > automatically deserializing numerics is
> > valuable. I
> > > > see
> > > > > > two
> > > > > > > > > other
> > > > > > > > > > >> ways
> > > > > > > > > > >> > >> to
> > > > > > > > > > >> > >> > get this, both keeping the serialization.format
> > > > config
> > > > > > the
> > > > > > > > > same:
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > - have json.decimal.deserialization.format
> accept
> > > all
> > > > > > three
> > > > > > > > > > >> formats.
> > > > > > > > > > >> > if
> > > > > > > > > > >> > >> set
> > > > > > > > > > >> > >> > to BINARY/TEXT, numerics would be automatically
> > > > > > supported.
> > > > > > > If
> > > > > > > > > set
> > > > > > > > > > >> to
> > > > > > > > > > >> > >> > NUMERIC, then any string coming in would result
> > in
> > > > > > > > > > deserialization
> > > > > > > > > > >> > error
> > > > > > > > > > >> > >> > (defaults to BINARY for backwards
> compatibility)
> > > > > > > > > > >> > >> > - change json.decimal.deserialization.format to
> > > > > > > > > > >> > >> > json.decimal.deserialization.string.format
> which
> > > > > accepts
> > > > > > > only
> > > > > > > > > > >> > >> BINARY/TEXT
> > > > > > > > > > >> > >> > (defaults to BINARY for backwards
> compatibility)
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > > would be a breaking change in that things
> that
> > > > > > previously
> > > > > > > > > > failed
> > > > > > > > > > >> > would
> > > > > > > > > > >> > >> > > suddenly start deserializing.  This is a
> price
> > > I'm
> > > > > > > willing
> > > > > > > > to
> > > > > > > > > > >> pay.
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > I agree. I'm willing to pay this price too.
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > > IMHO, we should then plan to switch the
> default
> > > of
> > > > > > > decimal
> > > > > > > > > > >> > >> serialization
> > > > > > > > > > >> > >> > to
> > > > > > > > > > >> > >> > > numeric, and text serialization to base 10 in
> > the
> > > > > next
> > > > > > > > major
> > > > > > > > > > >> > release.
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > I think that can be a separate discussion, I
> > don't
> > > > want
> > > > > > to
> > > > > > > > > block
> > > > > > > > > > >> this
> > > > > > > > > > >> > >> KIP
> > > > > > > > > > >> > >> > on it.
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > Thoughts?
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <
> > > > > > > > > o...@wikimedia.org>
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > > This is a bit orthogonal, but in
> > > > JsonSchemaConverter
> > > > > I
> > > > > > > use
> > > > > > > > > > >> > >> JSONSchemas to
> > > > > > > > > > >> > >> > > indicate whether a JSON number should be
> > > > deserialized
> > > > > > as
> > > > > > > an
> > > > > > > > > > >> integer
> > > > > > > > > > >> > >> or a
> > > > > > > > > > >> > >> > > decimal
> > > > > > > > > > >> > >> > > <
> > > > > > > > > > >> > >> > >
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >>
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> > > > > > > > > > >> > >> > > >.
> > > > > > > > > > >> > >> > > Not everyone is going to have JSONSchemas
> > > available
> > > > > > when
> > > > > > > > > > >> converting,
> > > > > > > > > > >> > >> but
> > > > > > > > > > >> > >> > if
> > > > > > > > > > >> > >> > > you do, it is an easy way to support JSON
> > numbers
> > > > as
> > > > > > > > > decimals.
> > > > > > > > > > >> > >> > >
> > > > > > > > > > >> > >> > > Carry on! :)
> > > > > > > > > > >> > >> > >
> > > > > > > > > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <
> > > > > > > > > a...@confluent.io
> > > > > > > > > > >
> > > > > > > > > > >> > >> wrote:
> > > > > > > > > > >> > >> > >
> > > > > > > > > > >> > >> > > > Hi Almog,
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > > > Like the KIP - I think being able to
> support
> > > > > decimals
> > > > > > > in
> > > > > > > > > JSON
> > > > > > > > > > >> in
> > > > > > > > > > >> > the
> > > > > > > > > > >> > >> > same
> > > > > > > > > > >> > >> > > > way most other systems do is a great
> > > improvement.
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > > > It's not 100% clear to me from the KIP what
> > the
> > > > > > current
> > > > > > > > > > format
> > > > > > > > > > >> is.
> > > > > > > > > > >> > >> Is
> > > > > > > > > > >> > >> > > the
> > > > > > > > > > >> > >> > > > text decimal a base16 encoded number, or is
> > it
> > > > > base16
> > > > > > > > > encoded
> > > > > > > > > > >> > binary
> > > > > > > > > > >> > >> > form
> > > > > > > > > > >> > >> > > > of the number? (I've not tried to get my
> head
> > > > > around
> > > > > > if
> > > > > > > > > these
> > > > > > > > > > >> two
> > > > > > > > > > >> > >> are
> > > > > > > > > > >> > >> > > even
> > > > > > > > > > >> > >> > > > different!)
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > > > One suggestion I have is to change the
> > proposed
> > > > new
> > > > > > > > config
> > > > > > > > > to
> > > > > > > > > > >> only
> > > > > > > > > > >> > >> > affect
> > > > > > > > > > >> > >> > > > decimals stored as text, i.e. to switch
> > between
> > > > the
> > > > > > > > current
> > > > > > > > > > >> base16
> > > > > > > > > > >> > >> and
> > > > > > > > > > >> > >> > > the
> > > > > > > > > > >> > >> > > > more common base10.   Then add another
> config
> > > to
> > > > > the
> > > > > > > > > > serialzier
> > > > > > > > > > >> > only
> > > > > > > > > > >> > >> > that
> > > > > > > > > > >> > >> > > > controls if decimals should be serialized
> as
> > > text
> > > > > or
> > > > > > > > > numeric.
> > > > > > > > > > >> The
> > > > > > > > > > >> > >> > > benefit
> > > > > > > > > > >> > >> > > > of this approach is it allows us to enhance
> > the
> > > > > > > > > deserializer
> > > > > > > > > > to
> > > > > > > > > > >> > >> > > > automatically handle numeric decimals even
> > > > without
> > > > > > any
> > > > > > > > > config
> > > > > > > > > > >> > >> having to
> > > > > > > > > > >> > >> > > be
> > > > > > > > > > >> > >> > > > set, i.e. default config in the
> deserializer
> > > > would
> > > > > be
> > > > > > > > able
> > > > > > > > > to
> > > > > > > > > > >> > handle
> > > > > > > > > > >> > >> > > > numeric decimals.  Of course, this is a two
> > > edged
> > > > > > > sword:
> > > > > > > > > this
> > > > > > > > > > >> > would
> > > > > > > > > > >> > >> > make
> > > > > > > > > > >> > >> > > > the deserializer work out of the box with
> > > numeric
> > > > > > > > decimals,
> > > > > > > > > > >> > (yay!),
> > > > > > > > > > >> > >> but
> > > > > > > > > > >> > >> > > > would be a breaking change in that things
> > that
> > > > > > > previously
> > > > > > > > > > >> failed
> > > > > > > > > > >> > >> would
> > > > > > > > > > >> > >> > > > suddenly start deserializing.  This is a
> > price
> > > > I'm
> > > > > > > > willing
> > > > > > > > > to
> > > > > > > > > > >> pay.
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > > > IMHO, we should then plan to switch the
> > default
> > > > of
> > > > > > > > decimal
> > > > > > > > > > >> > >> > serialization
> > > > > > > > > > >> > >> > > to
> > > > > > > > > > >> > >> > > > numeric, and text serialization to base 10
> in
> > > the
> > > > > > next
> > > > > > > > > major
> > > > > > > > > > >> > >> release.
> > > > > > > > > > >> > >> > > > (With upgrade notes to match). Though I
> know
> > > this
> > > > > is
> > > > > > > more
> > > > > > > > > > >> > >> contentious,
> > > > > > > > > > >> > >> > I
> > > > > > > > > > >> > >> > > > think it moves us forward in a much more
> > > standard
> > > > > way
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > >> > >> current
> > > > > > > > > > >> > >> > > > encoding of decimals.
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <
> > > > > > > > > > al...@confluent.io>
> > > > > > > > > > >> > >> wrote:
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > > > > Hi Everyone!
> > > > > > > > > > >> > >> > > > >
> > > > > > > > > > >> > >> > > > > Kicking off discussion for a new KIP:
> > > > > > > > > > >> > >> > > > >
> > > > > > > > > > >> > >> > > > >
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > >
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >>
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > > > > > > > > > >> > >> > > > >
> > > > > > > > > > >> > >> > > > > For those who are interested, I have a
> > > > prototype
> > > > > > > > > > >> implementation
> > > > > > > > > > >> > >> that
> > > > > > > > > > >> > >> > > > helped
> > > > > > > > > > >> > >> > > > > guide my design:
> > > > > > > > https://github.com/agavra/kafka/pull/1
> > > > > > > > > > >> > >> > > > >
> > > > > > > > > > >> > >> > > > > Cheers,
> > > > > > > > > > >> > >> > > > > Almog
> > > > > > > > > > >> > >> > > > >
> > > > > > > > > > >> > >> > > >
> > > > > > > > > > >> > >> > >
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to