Hi, tl;dr There are two (or three) different representations used for the URN "beam:logical_type:datetime:v1" in Beam Java SDK. Clarification or clean up is needed.
I recently try to resolve a long-time issue https://github.com/apache/beam/issues/19817 for the datetime logical type cross-language support. However I notice that currently in Java SDK this URN is referred in two places: (1) https://github.com/apache/beam/blob/cf9ea1f442636f781b9f449e953016bb39622781/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/DateTime.java#L49 where it has a representation of Row {Date: INT64, Time: INT64} (2) https://github.com/apache/beam/blob/cf9ea1f442636f781b9f449e953016bb39622781/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java#L67 where it is represented by a single INT64 Moreover, there is a third, "actual" representation that is used when one encodes a joda.DateTime or joda.Instant object in a Row, which is an int64 type encoded with fixed-size big endian. Note that this is different from (2) because INT64 will be decoded using a VarInt coder which causes data corruption when other sdk (e.g. python) pulls rows from java ptransform. To resolve this I try to implement the "actual" representation of logical type "beam:logical_type:datetime:v1" ( https://github.com/apache/beam/pull/22561 for contexts). It then becomes necessary to resolve this inconsistency because we have reached a point of adding a known logical type in schemas.proto. I am consider some solutions: i. Rename the logical type of case (2) as "beam:logical_type:instant:v1" as suggested by Brian in an earlier review. ii. "beam:logical_type:instant:v1" is still backed by INT64, but in implementation it will use BigEndianLongCoder to encode/decode the stream. For the second step ii, the problem is that there is a primitive type backed by a fixed length integer coder. Currently INT8, INT16, INT32, INT64... are all backed by VarInt (and there is ongoing work to use fixed size big endian to encode INT8, INT16 ( https://github.com/apache/beam/issues/19815)). Ideally I would think (INT8, INT16, INT32, INT64) are all fixed and having a generic (INT) primitive type is backed by VarInt. But this may be a more substantial change for the current code base. I would like to have opinions from the community. Thanks for your attention! Regards, Yi -- Yi Hu, (he/him/his) Software Engineer