On Thu, Aug 11, 2022 at 8:50 AM Yi Hu via dev <dev@beam.apache.org> wrote:

> Hi,
>
> tl;dr There are two (or three) different representations used for the URN
> "beam:logical_type:datetime:v1" in Beam Java SDK. Clarification or clean up
> is needed.
>
> I recently try to resolve a long-time issue
> https://github.com/apache/beam/issues/19817 for the datetime logical type
> cross-language support. However I notice that currently in Java SDK this
> URN is referred in two places:
>
> (1)
> https://github.com/apache/beam/blob/cf9ea1f442636f781b9f449e953016bb39622781/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/DateTime.java#L49
> where it has a representation of Row {Date: INT64, Time: INT64}
>
> (2)
> https://github.com/apache/beam/blob/cf9ea1f442636f781b9f449e953016bb39622781/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java#L67
> where it is represented by a single INT64
>
> Moreover, there is a third, "actual" representation that is used when one
> encodes a joda.DateTime or joda.Instant object in a Row, which is an int64
> type encoded with fixed-size big endian. Note that this is different from
> (2) because INT64 will be decoded using a VarInt coder which causes data
> corruption when other sdk (e.g. python) pulls rows from java ptransform.
>
> To resolve this I try to implement the "actual" representation of logical
> type "beam:logical_type:datetime:v1" (
> https://github.com/apache/beam/pull/22561 for contexts). It then becomes
> necessary to resolve this inconsistency because we have reached a point of
> adding a known logical type in schemas.proto. I am consider some solutions:
>
> i. Rename the logical type of case (2) as "beam:logical_type:instant:v1"
> as suggested by Brian in an earlier review.
>

+1


>
> ii. "beam:logical_type:instant:v1" is still backed by INT64, but in
> implementation it will use BigEndianLongCoder to encode/decode the stream.
>
>
Is this to be compatible with the current Java implementation ? And we have
to update other SDKs to use big endian coders when encoding/decoding the
"beam:logical_type:instant:v1" logical type ?



> For the second step ii, the problem is that there is a primitive type
> backed by a fixed length integer coder. Currently INT8, INT16, INT32,
> INT64... are all backed by VarInt (and there is ongoing work to use fixed
> size big endian to encode INT8, INT16 (
> https://github.com/apache/beam/issues/19815)). Ideally I would think
> (INT8, INT16, INT32, INT64) are all fixed and having a generic (INT)
> primitive type is backed by VarInt. But this may be a more substantial
> change for the current code base.
>

I'm a bit confused by this. Did you mean that there's *no* primitive type
backed by a fixed length integer coder ? Also, by primitive, I'm assuming
you mean Beam Schema types here.

Thanks,
Cham


>
> I would like to have opinions from the community. Thanks for your
> attention!
>
> Regards,
> Yi
>
> --
>
> Yi Hu, (he/him/his)
>
> Software Engineer
>
>
>

Reply via email to