Thanks for the updates, Almog. This looks really good, but I have a few
more comments (most wording, though one potentially thorny issue):

First, the KIP should define the default value for the `decimal.format`
property. IIUC, it will be BASE64, and to gain the new behavior users will
have to explicitly set this to NUMERIC. In fact, I'd recommend changing
this bullet item from:

   - JsonConverter will accept the decimal.format configuration to
   determine the serialization format. If the value is BASE64, the behavior
   remains unchanged (i.e. it serializes it as a base64 text). If the value is
   NUMERIC, the JSON node will be a number representing that decimal (e.g.
   10.2345 instead of "D3J5").

to something like:

   - Define a new `decimal.format` configuration property on JsonConverter
   to specify the serialization format for Connect DECIMAL logical type
   values, with two allowed literal values for the configuration property:
   - `BASE64` specifies the existing behavior of serializing DECIMAL
      logical types as base64 encoded binary data (e.g., "D3J5" in the example
      above), and will be the default; and
      - `NUMERIC` will serialize Connect DECIMAL logical type values in
      JSON as a number representing that decimal

Second, since the default will be the existing BASE64 representation, what
do you think about changing the JsonConverter instances used by the Connect
worker for internal topics to enable `decimal.format=NUMERIC`? I don't
think we actually use decimals in the internal messages, but if we do at
some point then the converters will store them with the improved natural
representation.

Third, the following bullet item could be more clear:

   - JsonConverter will automatically handle deserialization of either
   serialization format given a Decimal logical type schema, i.e. it will
   accept both a deserialized BinaryNode and NumericNode. If the value is a
   BinaryNode, it will construct a java BigDecimal from the binaryValue()
   (which is a btye[]). If the value is a NumericNode, it will simply pass
   through the decimalValue() deserialized by the JsonDeserializer.

such as maybe:

   - The JsonConverter deserialization method currently expects only a
   BinaryNode, but will be changed to also handle NumericNode by calling
   NumericNode.decimalValue().

This brings up an interesting potential issue: if `schemas.enable=false`,
then there will be no schema in the record, and calling
`NumericNode.decimalValue()` will always return a java.math.BigDecimal even
if the underlying Number was not a BigDecimal. Shouldn't the
`convertToConnect(Schema, JsonNode) instead use the type of Number value as
parsed and returned by the JsonDeserializer to determine the proper schema
type, and then get the proper value type using that schema type's converter?

Fourth, I'm not sure I understand the following bullet:

   - JsonDeserializer will now default floating point deserialization to
   BigDecimal to avoid losing precision. This may impact performance when
   deserializing doubles - a JMH microbenchmark on my local MBP, this
   estimated about 3x degradation for deserializing JSON floating points. If
   the connect schema is not the decimal logical type, the JsonConverter will
   convert this BigDecimal value into the corresponding floating point java
   object.

Fifth, I think the migration plan is not quite accurate. Step 3 mentions
changing the Connect worker config's key and value converters to use the
new setting, and a restart (step 4) is necessary after this step. Perhaps
step 3 should be "If the Connect worker uses the JsonConverter for the key
and/or value converters, optionally set the `decimal.format=NUMERIC` for
the key and/or value converter and restart the workers." followed by Step
4: "If desired, update any source connector configs that use the
JsonConverter for key and/or value converters to use
`decimal.format=NUMERIC`."

Finally, should we add a short discussion in the Rejected Alternatives
about the option of leaving JsonConverter untouched and creating a
different converter implementation?

Thanks!

Randall



On Mon, Aug 26, 2019 at 11:51 AM Almog Gavra <al...@confluent.io> wrote:

> Thanks for the feedback Randall! I have updated the KIP with the following
> edits:
>
> * Updated the reference from "producer" to "source" (I had missed that
> one!)
> * Changed the config from "json.decimal.serialization.format" to
> "decimal.format"
> * Clarified case sensitivity
> * Clarified the proposed changes to note that deserialization is not
> affected by the config
> * Clarified the changes in JsonConverter to handle deserialization (see my
> third bullet below)
> * Added a clear migration plan and simplified compatibility
>
> Here are also some clarifications based on your comments.
>
> * I think "json" has limited value in the configuration name. If put in a
> top-level worker config, it clarifies that it only affects connectors using
> the JsonConverter. I have opted for your suggestion and dropped it.
> * I think "serialization" has limited value in the configuration name. If
> we ever want to introduce "deserialization" configurations, there will be
> asymmetry in the configuration names. I have opted for your suggestion and
> dropped it.
> * The JsonConverter will not "always look for numbers". The converter will
> receive from the Jackson Object Mapper either a NumericNode containing a
> big decimal or a BinaryNode containing a btye[]. Based on the type of this
> node, it will convert the value to a BigDecimal appropriately (or any other
> Connect java type based on the schema).
> * "the ... JsonDeserializer are not affected" is not exactly true, but
> semantically correct. See the note in the KIP about defaulting floating
> points to BigDecimal to avoid precision loss.
> * "The resulting application, however, may need to handle a wider range of
> numeric values." Unless I misunderstand what you're saying, I don't think
> this is correct. The resulting application will still receive exactly the
> same Connect data object from the JsonConverter as it was before - only the
> SerDe layer is affected.
>
> Cheers,
> Almog
>
> On Sun, Aug 25, 2019 at 4:28 PM Randall Hauch <rha...@gmail.com> wrote:
>
> > Thanks for all the work, Almog.
> >
> > For the most part, I think this KIP will be a great improvement, and IMO
> is
> > almost ready to go. However, I do have a few suggestions that affect the
> > wording more than the intent.
> >
> > First, the name of the `json.decimal.serialization.format` property is
> > pretty long, especially when it is prefixed in the Worker config or in a
> > connector config as `key.converer.json.decimal.serialization.format` or
> > `value.converter.json.decimal.serialization.format` . Have you
> considered a
> > shorter config property name, such as maybe `decimal.format`? Is there
> any
> > benefit to include "json" and "serialization" in the property name? Also,
> > we should be clear that the value will not be case sensitive (e.g.,
> > "numeric" and "NUMERIC" would be equivalent), to keep in alignment with
> > other enumeration literals in Connect configurations. The goal should be
> > simple
> >
> > Second, the Motivation section, has the following sentence:
> >
> > "A new configuration for producers json.decimal.serialization.format will
> > be introduced to the JsonConverter configuration to help control whether
> > source converters will serialize decimals in numeric or binary formats."
> >
> >
> > I agree with an earlier comment from Konstantine that "producers" here is
> > distracting and does not mirror the normal definition of "producers"
> within
> > the Kafka context. I suggest rephrasing this to something like
> >
> > "Introduce to the JsonConverter a new configuration property named
> > json.decimal.serialization.format to control whether source converters
> will
> > serialize decimals in numeric or binary formats."
> >
> >
> > Third, the KIP should be more clear about whether the
> > `json.decimal.serialization.format` setting does or does not affect
> > deserialization? IIUC, the deserialization logic will always look for
> JSON
> > numbers, and will always use the Schema to define whether it should
> convert
> > the value to a different number type. Is that a fair statement?
> >
> > Fourth, the JsonSerializer and JsonDeserializer are not affected, yet are
> > still compatible with the old and new behavior. Because the primary
> purpose
> > of this new setting is to define how Connect DECIMAL logical type values
> > are serialized in JSON documents, the JsonDeserializer will still be able
> > to deserialize the JSON document correctly. The resulting application,
> > however, may need to handle a wider range of numeric values.
> >
> > Fifth, the Compatibility section seems more complicated than perhaps it
> > needs to be, maybe because it seems to distinguish between upgrading and
> > setting the decimal serialization format. Maybe it would be sufficient to
> > simply emphasize that all of the sink connectors (or consumer
> applications)
> > using the JsonConverter with
> > the `json.decimal.serialization.format=NUMERIC` setting consuming records
> > from a set of topics be upgraded and changed *before* any of the source
> > connectors (or other producer applications) using the JsonConverter to
> > serialize records are changed to use
> > the `json.decimal.serialization.format=NUMERIC` setting? It may also
> > warrant giving more concrete advice on upgrade procedures. For example,
> how
> > does a user upgrade a set of Connect workers to use this new property? Do
> > they upgrade first and restart to ensure everything runs as-is, and then
> > upgrade their source connectors to set
> > `json.decimal.serialization.format=NUMERIC`via connector configurations
> or
> > worker configs?
> >
> > Anyway, great job so far!
> >
> > Best regards,
> >
> > Randall
> >
> > On Thu, Aug 15, 2019 at 12:00 PM Konstantine Karantasis <
> > konstant...@confluent.io> wrote:
> >
> > > Thanks! KIP reads even better for me now.
> > > Just voted. +1 non-binding
> > >
> > > Konstantine
> > >
> > > On Wed, Aug 14, 2019 at 7:00 PM Almog Gavra <al...@confluent.io>
> wrote:
> > >
> > > > Thanks for the review Konstantine!
> > > >
> > > > I think the terminology suggestion definitely makes things clearer -
> I
> > > will
> > > > update the documentation based on your suggestion (e.g.
> s/Consumer/Sink
> > > > Converter/g and s/Producer/Source Converter/g).
> > > >
> > > > Cheers,
> > > > Almog
> > > >
> > > > On Wed, Aug 14, 2019 at 8:13 AM Konstantine Karantasis <
> > > > konstant...@confluent.io> wrote:
> > > >
> > > > > Thanks Almog for preparing this KIP!
> > > > > I think it will improve usability and troubleshooting with JSON
> data
> > a
> > > > lot.
> > > > >
> > > > > The finalized plan seems quite concrete now. I also liked that some
> > > > > implementation specific implications (such as setting the
> > ObjectMapper
> > > to
> > > > > deserialize floating point as BigDecimal) are highlighted in the
> KIP.
> > > > >
> > > > > Still, as I was reading the KIP, the main obstacle I encountered
> was
> > > > around
> > > > > terminology. I couldn't get used to reading "producer" and
> "consumer"
> > > and
> > > > > not thinking in terms of Kafka producers and consumers - which are
> > not
> > > > > relevant to what this KIP proposes. Thus, I'd suggest replacing
> > > > > "Producer(s)" with "Source Converter(s)" and "Consumer(s)" with
> "Sink
> > > > > Converter(s)" (even if "Converter used by Source Connector" or
> > > "Converter
> > > > > used by Sink Connector" would be even more accurate - maybe this
> > could
> > > be
> > > > > an explanation in a footnote). Terminology around converters has
> been
> > > > > tricky in the past and adding producers/consumers in the mix might
> > add
> > > to
> > > > > the confusion.
> > > > >
> > > > > Another example where I'd apply this different terminology would be
> > to
> > > a
> > > > > phrase such as the following:
> > > > > "Because of this, users must take care to first ensure that all
> > > consumers
> > > > > have upgraded to the new code before upgrading producers to make
> use
> > of
> > > > the
> > > > > NUMERIC serialization format."
> > > > > which I'd write
> > > > > "Because of this, users must take care to first ensure that all
> sink
> > > > > connectors have upgraded to the new converter code before upgrading
> > > > source
> > > > > connectors to make use of the NUMERIC serialization format in
> > > > > JsonConverter."
> > > > >
> > > > > Let me know if you think this suggestion makes the KIP easier to
> > > follow.
> > > > > Otherwise I think it's a solid proposal.
> > > > >
> > > > > I'm concluding with a couple of nits:
> > > > >
> > > > > - "Upgraded Producer with BASE64 serialization, Legacy Consumer:
> this
> > > > > scenario is okay as the upgraded ~producer~ consumer will be able
> to
> > > read
> > > > > binary as today" (again according to my suggestion above, it could
> be
> > > as
> > > > > the upgraded source converter ...)
> > > > >
> > > > > - "consumers cannot consumer NUMERIC data. " -> "consumers cannot
> > read
> > > > > NUMERIC data"
> > > > >
> > > > > Best,
> > > > > Konstantine
> > > > >
> > > > > On Fri, Aug 9, 2019 at 6:37 PM Almog Gavra <al...@confluent.io>
> > wrote:
> > > > >
> > > > > > Good catches! Fixed :)
> > > > > >
> > > > > > On Thu, Aug 8, 2019 at 10:36 PM Arjun Satish <
> > arjun.sat...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Cool!
> > > > > > >
> > > > > > > Couple of nits:
> > > > > > >
> > > > > > > - In public interfaces, typo:
> *json.decimal.serialization.fromat*
> > > > > > > - In public interfaces, you use the term "HEX" instead of
> > "BASE64".
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 7, 2019 at 9:51 AM Almog Gavra <al...@confluent.io
> >
> > > > wrote:
> > > > > > >
> > > > > > > > EDIT: everywhere I've been using "HEX" I meant to be using
> > > > "BASE64".
> > > > > I
> > > > > > > will
> > > > > > > > update the KIP to reflect this.
> > > > > > > >
> > > > > > > > On Wed, Aug 7, 2019 at 9:44 AM Almog Gavra <
> al...@confluent.io
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the feedback Arjun! I'm happy changing the
> default
> > > > > config
> > > > > > to
> > > > > > > > > HEX instead of BINARY, no strong feelings there.
> > > > > > > > >
> > > > > > > > > I'll also clarify the example in the KIP to be clearer:
> > > > > > > > >
> > > > > > > > > - serialize the decimal field "foo" with value "10.2345"
> with
> > > the
> > > > > HEX
> > > > > > > > > setting: {"foo": "D3J5"}
> > > > > > > > > - serialize the decimal field "foo" with value "10.2345"
> with
> > > the
> > > > > > > NUMERIC
> > > > > > > > > setting: {"foo": 10.2345}
> > > > > > > > >
> > > > > > > > > With regards to the precision issue, that was my original
> > > concern
> > > > > as
> > > > > > > well
> > > > > > > > > (and why I originally suggested a TEXT format). Many JSON
> > > > > > deserializers
> > > > > > > > > (e.g. Jackson with
> > > > > > DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS),
> > > > > > > > > however, have the ability to deserialize decimals correctly
> > so
> > > I
> > > > > will
> > > > > > > > > configure that as the default for Connect's
> JsonDeserializer.
> > > > It's
> > > > > > > > probably
> > > > > > > > > a good idea to call out that using other deserializers must
> > be
> > > > done
> > > > > > > with
> > > > > > > > > care - I will add that documentation to the serialization
> > > config.
> > > > > > > > >
> > > > > > > > > Note that there would not be an issue on the
> _serialization_
> > > side
> > > > > of
> > > > > > > > > things as Jackson respects BigDecimal.
> > > > > > > > >
> > > > > > > > > Almog
> > > > > > > > >
> > > > > > > > > On Tue, Aug 6, 2019 at 11:23 PM Arjun Satish <
> > > > > arjun.sat...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> hey Almog, nice work! couple of thoughts (hope I'm not
> late
> > > > since
> > > > > > you
> > > > > > > > >> started the voting thread already):
> > > > > > > > >>
> > > > > > > > >> can you please add some examples to show the changes that
> > you
> > > > are
> > > > > > > > >> proposing. makes me think that for a given decimal number,
> > we
> > > > will
> > > > > > > have
> > > > > > > > >> two
> > > > > > > > >> encodings: “asHex” and “asNumber”.
> > > > > > > > >>
> > > > > > > > >> should we call the default config value “HEX” instead of
> > > > “BINARY”?
> > > > > > > > >>
> > > > > > > > >> Should we call out the fact that JS systems might be
> > > susceptible
> > > > > to
> > > > > > > > double
> > > > > > > > >> precision round offs with the new numeric format? here are
> > > some
> > > > > > people
> > > > > > > > >> discussing a similar problem
> > > > > > > > >> https://github.com/EventStore/EventStore/issues/1541
> > > > > > > > >>
> > > > > > > > >> On Tue, Aug 6, 2019 at 1:40 PM Almog Gavra <
> > > al...@confluent.io>
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Hello Everyone,
> > > > > > > > >> >
> > > > > > > > >> > Summarizing an in-person discussion with Randall (this
> is
> > > > copied
> > > > > > > from
> > > > > > > > >> the
> > > > > > > > >> > KIP):
> > > > > > > > >> >
> > > > > > > > >> > The original KIP suggested supporting an additional
> > > > > > representation -
> > > > > > > > >> base10
> > > > > > > > >> > encoded text (e.g. `{"asText":"10.2345"}`). This causes
> > > issues
> > > > > > > because
> > > > > > > > >> it
> > > > > > > > >> > is impossible to disambiguate between TEXT and BINARY
> > > without
> > > > an
> > > > > > > > >> additional
> > > > > > > > >> > config - furthermore, this makes the migration from one
> to
> > > the
> > > > > > other
> > > > > > > > >> nearly
> > > > > > > > >> > impossible because it would require that all consumers
> > stop
> > > > > > > consuming
> > > > > > > > >> and
> > > > > > > > >> > producers stop producing and atomically updating the
> > config
> > > on
> > > > > all
> > > > > > > of
> > > > > > > > >> them
> > > > > > > > >> > after deploying the new code, or waiting for the full
> > > > retention
> > > > > > > period
> > > > > > > > >> to
> > > > > > > > >> > pass - neither option is viable. The suggestion in the
> KIP
> > > is
> > > > > > > strictly
> > > > > > > > >> an
> > > > > > > > >> > improvement over the existing behavior, even if it
> doesn't
> > > > > support
> > > > > > > all
> > > > > > > > >> > combinations.
> > > > > > > > >> >
> > > > > > > > >> > It seems that since most real-world use cases actually
> use
> > > the
> > > > > > > numeric
> > > > > > > > >> > representation (not string) we can consider this an
> > > > improvement.
> > > > > > > With
> > > > > > > > >> the
> > > > > > > > >> > new suggestion, we don't need a deserialization
> > > configuration
> > > > > > (only
> > > > > > > a
> > > > > > > > >> > serialization option) and the consumers will be able to
> > > always
> > > > > > > > >> > automatically determine the serialization format.
> > > > > > > > >> >
> > > > > > > > >> > Based on this, I'll be opening up the simplified version
> > of
> > > > the
> > > > > > KIP
> > > > > > > > to a
> > > > > > > > >> > vote.
> > > > > > > > >> >
> > > > > > > > >> > Almog
> > > > > > > > >> >
> > > > > > > > >> > On Mon, Jul 29, 2019 at 9:29 AM Almog Gavra <
> > > > al...@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > I'm mostly happy with your current suggestion (two
> > > configs,
> > > > > one
> > > > > > > for
> > > > > > > > >> > > serialization and one for deserialization) and your
> > > > > > implementation
> > > > > > > > >> > > suggestion. One thing to note:
> > > > > > > > >> > >
> > > > > > > > >> > > > We should _always_ be able to deserialize a standard
> > > JSON
> > > > > > > > >> > > > number as a decimal
> > > > > > > > >> > >
> > > > > > > > >> > > I was doing some research into decimals and JSON, and
> I
> > > can
> > > > > > > imagine
> > > > > > > > a
> > > > > > > > >> > > compelling reason to require string representations to
> > > avoid
> > > > > > > losing
> > > > > > > > >> > > precision and to be certain that whomever is sending
> the
> > > > data
> > > > > > > isn't
> > > > > > > > >> > losing
> > > > > > > > >> > > precision (e.g.
> > > > https://stackoverflow.com/a/38357877/2258040
> > > > > ).
> > > > > > > > >> > >
> > > > > > > > >> > > I'm okay with always allowing numerics, but thought
> it's
> > > > worth
> > > > > > > > raising
> > > > > > > > >> > the
> > > > > > > > >> > > thought.
> > > > > > > > >> > >
> > > > > > > > >> > > On Mon, Jul 29, 2019 at 4:57 AM Andy Coates <
> > > > > a...@confluent.io>
> > > > > > > > >> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > >> The way I see it, we need to control two seperate
> > things:
> > > > > > > > >> > >>
> > > > > > > > >> > >> 1. How do we _deserialize_ a decimal type if we
> > > encounter a
> > > > > > text
> > > > > > > > >> node in
> > > > > > > > >> > >> the JSON?    (We should _always_ be able to
> > deserialize a
> > > > > > > standard
> > > > > > > > >> JSON
> > > > > > > > >> > >> number as a decimal).
> > > > > > > > >> > >> 2. How do we chose how we want decimals to be
> > > _serialized_.
> > > > > > > > >> > >>
> > > > > > > > >> > >> This looks to fits well with your second suggestion
> of
> > > > > slightly
> > > > > > > > >> > different
> > > > > > > > >> > >> configs names for serialization vs deserialization.
> > > > > > > > >> > >> a, For deserialization we only care about how to
> handle
> > > > text
> > > > > > > > nodes: `
> > > > > > > > >> > >> deserialization.decimal.*text*.format`, which should
> > only
> > > > > have
> > > > > > > two
> > > > > > > > >> valid
> > > > > > > > >> > >> values BINARY | TEXT.
> > > > > > > > >> > >> b. For serialization we need all three:
> > > > > > > > >> `serialization.decimal.format`,
> > > > > > > > >> > >> which should support all three options: BINARY |
> TEXT |
> > > > > > NUMERIC.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Implementation wise, I think these should be two
> > separate
> > > > > > enums,
> > > > > > > > >> rather
> > > > > > > > >> > >> than one shared enum and throwing an error if the
> > > > > deserializer
> > > > > > is
> > > > > > > > >> set to
> > > > > > > > >> > >> NUMERIC.  Mainly as this means the enums reflect the
> > > > options
> > > > > > > > >> available,
> > > > > > > > >> > >> rather than this being hidden in config checking
> code.
> > > But
> > > > > > > that's
> > > > > > > > a
> > > > > > > > >> > minor
> > > > > > > > >> > >> implementation detail.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Personally, I'd be tempted to have the BINARY value
> > named
> > > > > > > something
> > > > > > > > >> like
> > > > > > > > >> > >> `LEGACY` or `LEGACY_BINARY` as a way of encouraging
> > users
> > > > to
> > > > > > move
> > > > > > > > >> away
> > > > > > > > >> > >> from
> > > > > > > > >> > >> it.
> > > > > > > > >> > >>
> > > > > > > > >> > >> It's a real shame that both of these settings
> require a
> > > > > default
> > > > > > > of
> > > > > > > > >> > BINARY
> > > > > > > > >> > >> for backwards compatibility, but I agree that
> > > discussions /
> > > > > > plans
> > > > > > > > >> around
> > > > > > > > >> > >> switching the defaults should not block this KIP.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Andy
> > > > > > > > >> > >>
> > > > > > > > >> > >>
> > > > > > > > >> > >> On Thu, 25 Jul 2019 at 18:26, Almog Gavra <
> > > > > al...@confluent.io>
> > > > > > > > >> wrote:
> > > > > > > > >> > >>
> > > > > > > > >> > >> > Thanks for the replies Andy and Andrew (2x Andy?)!
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > > Is the text decimal a base16 encoded number, or
> is
> > it
> > > > > > base16
> > > > > > > > >> encoded
> > > > > > > > >> > >> > binary
> > > > > > > > >> > >> > > form of the number?
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > The conversion happens as
> > > > > > decimal.unscaledValue().toByteArray()
> > > > > > > > and
> > > > > > > > >> > then
> > > > > > > > >> > >> > the byte array is converted to a hex string, so
> it's
> > > > > > definitely
> > > > > > > > the
> > > > > > > > >> > >> binary
> > > > > > > > >> > >> > form of the number converted to base16. Whether or
> > not
> > > > > that's
> > > > > > > the
> > > > > > > > >> same
> > > > > > > > >> > >> as
> > > > > > > > >> > >> > the base16 encoded number is a good question
> > > (toByteArray
> > > > > > > > returns a
> > > > > > > > >> > byte
> > > > > > > > >> > >> > array containing a signed, big-endian, two's
> > complement
> > > > > > > > >> representation
> > > > > > > > >> > >> of
> > > > > > > > >> > >> > the big integer).
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > > One suggestion I have is to change the proposed
> new
> > > > > config
> > > > > > to
> > > > > > > > >> only
> > > > > > > > >> > >> affect
> > > > > > > > >> > >> > > decimals stored as text, i.e. to switch between
> the
> > > > > current
> > > > > > > > >> base16
> > > > > > > > >> > and
> > > > > > > > >> > >> > the
> > > > > > > > >> > >> > > more common base10.   Then add another config to
> > the
> > > > > > > serializer
> > > > > > > > >> only
> > > > > > > > >> > >> that
> > > > > > > > >> > >> > > controls if decimals should be serialized as text
> > or
> > > > > > numeric.
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > I think we need to be able to handle all mappings
> > from
> > > > > > > > >> serialization
> > > > > > > > >> > >> format
> > > > > > > > >> > >> > to deserialization format (e.g. read in BINARY and
> > > output
> > > > > > > TEXT),
> > > > > > > > >> > which I
> > > > > > > > >> > >> > think would be impossible with the alternative
> > > > suggestion.
> > > > > I
> > > > > > > > agree
> > > > > > > > >> > that
> > > > > > > > >> > >> > automatically deserializing numerics is valuable. I
> > see
> > > > two
> > > > > > > other
> > > > > > > > >> ways
> > > > > > > > >> > >> to
> > > > > > > > >> > >> > get this, both keeping the serialization.format
> > config
> > > > the
> > > > > > > same:
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > - have json.decimal.deserialization.format accept
> all
> > > > three
> > > > > > > > >> formats.
> > > > > > > > >> > if
> > > > > > > > >> > >> set
> > > > > > > > >> > >> > to BINARY/TEXT, numerics would be automatically
> > > > supported.
> > > > > If
> > > > > > > set
> > > > > > > > >> to
> > > > > > > > >> > >> > NUMERIC, then any string coming in would result in
> > > > > > > > deserialization
> > > > > > > > >> > error
> > > > > > > > >> > >> > (defaults to BINARY for backwards compatibility)
> > > > > > > > >> > >> > - change json.decimal.deserialization.format to
> > > > > > > > >> > >> > json.decimal.deserialization.string.format which
> > > accepts
> > > > > only
> > > > > > > > >> > >> BINARY/TEXT
> > > > > > > > >> > >> > (defaults to BINARY for backwards compatibility)
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > > would be a breaking change in that things that
> > > > previously
> > > > > > > > failed
> > > > > > > > >> > would
> > > > > > > > >> > >> > > suddenly start deserializing.  This is a price
> I'm
> > > > > willing
> > > > > > to
> > > > > > > > >> pay.
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > I agree. I'm willing to pay this price too.
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > > IMHO, we should then plan to switch the default
> of
> > > > > decimal
> > > > > > > > >> > >> serialization
> > > > > > > > >> > >> > to
> > > > > > > > >> > >> > > numeric, and text serialization to base 10 in the
> > > next
> > > > > > major
> > > > > > > > >> > release.
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > I think that can be a separate discussion, I don't
> > want
> > > > to
> > > > > > > block
> > > > > > > > >> this
> > > > > > > > >> > >> KIP
> > > > > > > > >> > >> > on it.
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > Thoughts?
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > On Thu, Jul 25, 2019 at 6:35 AM Andrew Otto <
> > > > > > > o...@wikimedia.org>
> > > > > > > > >> > wrote:
> > > > > > > > >> > >> >
> > > > > > > > >> > >> > > This is a bit orthogonal, but in
> > JsonSchemaConverter
> > > I
> > > > > use
> > > > > > > > >> > >> JSONSchemas to
> > > > > > > > >> > >> > > indicate whether a JSON number should be
> > deserialized
> > > > as
> > > > > an
> > > > > > > > >> integer
> > > > > > > > >> > >> or a
> > > > > > > > >> > >> > > decimal
> > > > > > > > >> > >> > > <
> > > > > > > > >> > >> > >
> > > > > > > > >> > >> >
> > > > > > > > >> > >>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L251-L261
> > > > > > > > >> > >> > > >.
> > > > > > > > >> > >> > > Not everyone is going to have JSONSchemas
> available
> > > > when
> > > > > > > > >> converting,
> > > > > > > > >> > >> but
> > > > > > > > >> > >> > if
> > > > > > > > >> > >> > > you do, it is an easy way to support JSON numbers
> > as
> > > > > > > decimals.
> > > > > > > > >> > >> > >
> > > > > > > > >> > >> > > Carry on! :)
> > > > > > > > >> > >> > >
> > > > > > > > >> > >> > > On Thu, Jul 25, 2019 at 9:12 AM Andy Coates <
> > > > > > > a...@confluent.io
> > > > > > > > >
> > > > > > > > >> > >> wrote:
> > > > > > > > >> > >> > >
> > > > > > > > >> > >> > > > Hi Almog,
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > > > Like the KIP - I think being able to support
> > > decimals
> > > > > in
> > > > > > > JSON
> > > > > > > > >> in
> > > > > > > > >> > the
> > > > > > > > >> > >> > same
> > > > > > > > >> > >> > > > way most other systems do is a great
> improvement.
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > > > It's not 100% clear to me from the KIP what the
> > > > current
> > > > > > > > format
> > > > > > > > >> is.
> > > > > > > > >> > >> Is
> > > > > > > > >> > >> > > the
> > > > > > > > >> > >> > > > text decimal a base16 encoded number, or is it
> > > base16
> > > > > > > encoded
> > > > > > > > >> > binary
> > > > > > > > >> > >> > form
> > > > > > > > >> > >> > > > of the number? (I've not tried to get my head
> > > around
> > > > if
> > > > > > > these
> > > > > > > > >> two
> > > > > > > > >> > >> are
> > > > > > > > >> > >> > > even
> > > > > > > > >> > >> > > > different!)
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > > > One suggestion I have is to change the proposed
> > new
> > > > > > config
> > > > > > > to
> > > > > > > > >> only
> > > > > > > > >> > >> > affect
> > > > > > > > >> > >> > > > decimals stored as text, i.e. to switch between
> > the
> > > > > > current
> > > > > > > > >> base16
> > > > > > > > >> > >> and
> > > > > > > > >> > >> > > the
> > > > > > > > >> > >> > > > more common base10.   Then add another config
> to
> > > the
> > > > > > > > serialzier
> > > > > > > > >> > only
> > > > > > > > >> > >> > that
> > > > > > > > >> > >> > > > controls if decimals should be serialized as
> text
> > > or
> > > > > > > numeric.
> > > > > > > > >> The
> > > > > > > > >> > >> > > benefit
> > > > > > > > >> > >> > > > of this approach is it allows us to enhance the
> > > > > > > deserializer
> > > > > > > > to
> > > > > > > > >> > >> > > > automatically handle numeric decimals even
> > without
> > > > any
> > > > > > > config
> > > > > > > > >> > >> having to
> > > > > > > > >> > >> > > be
> > > > > > > > >> > >> > > > set, i.e. default config in the deserializer
> > would
> > > be
> > > > > > able
> > > > > > > to
> > > > > > > > >> > handle
> > > > > > > > >> > >> > > > numeric decimals.  Of course, this is a two
> edged
> > > > > sword:
> > > > > > > this
> > > > > > > > >> > would
> > > > > > > > >> > >> > make
> > > > > > > > >> > >> > > > the deserializer work out of the box with
> numeric
> > > > > > decimals,
> > > > > > > > >> > (yay!),
> > > > > > > > >> > >> but
> > > > > > > > >> > >> > > > would be a breaking change in that things that
> > > > > previously
> > > > > > > > >> failed
> > > > > > > > >> > >> would
> > > > > > > > >> > >> > > > suddenly start deserializing.  This is a price
> > I'm
> > > > > > willing
> > > > > > > to
> > > > > > > > >> pay.
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > > > IMHO, we should then plan to switch the default
> > of
> > > > > > decimal
> > > > > > > > >> > >> > serialization
> > > > > > > > >> > >> > > to
> > > > > > > > >> > >> > > > numeric, and text serialization to base 10 in
> the
> > > > next
> > > > > > > major
> > > > > > > > >> > >> release.
> > > > > > > > >> > >> > > > (With upgrade notes to match). Though I know
> this
> > > is
> > > > > more
> > > > > > > > >> > >> contentious,
> > > > > > > > >> > >> > I
> > > > > > > > >> > >> > > > think it moves us forward in a much more
> standard
> > > way
> > > > > > that
> > > > > > > > the
> > > > > > > > >> > >> current
> > > > > > > > >> > >> > > > encoding of decimals.
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > > > On Tue, 25 Jun 2019 at 01:03, Almog Gavra <
> > > > > > > > al...@confluent.io>
> > > > > > > > >> > >> wrote:
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > > > > Hi Everyone!
> > > > > > > > >> > >> > > > >
> > > > > > > > >> > >> > > > > Kicking off discussion for a new KIP:
> > > > > > > > >> > >> > > > >
> > > > > > > > >> > >> > > > >
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > >
> > > > > > > > >> > >> >
> > > > > > > > >> > >>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-481%3A+SerDe+Improvements+for+Connect+Decimal+type+in+JSON
> > > > > > > > >> > >> > > > >
> > > > > > > > >> > >> > > > > For those who are interested, I have a
> > prototype
> > > > > > > > >> implementation
> > > > > > > > >> > >> that
> > > > > > > > >> > >> > > > helped
> > > > > > > > >> > >> > > > > guide my design:
> > > > > > https://github.com/agavra/kafka/pull/1
> > > > > > > > >> > >> > > > >
> > > > > > > > >> > >> > > > > Cheers,
> > > > > > > > >> > >> > > > > Almog
> > > > > > > > >> > >> > > > >
> > > > > > > > >> > >> > > >
> > > > > > > > >> > >> > >
> > > > > > > > >> > >> >
> > > > > > > > >> > >>
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to