[
https://issues.apache.org/jira/browse/CALCITE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086253#comment-18086253
]
Ruiqi Dong commented on CALCITE-7553:
-------------------------------------
Thanks. I did an initial audit, and I agree that the two notions are entangled.
The call sites I found that need attention are RexSimplify, RexInterpreter,
RexBuilder's Sarg construction for IN/BETWEEN, RexSimplify's SargCollector,
RelMdPredicates, RelMdCollation, RelJson, and the RexLiteral/RexBuilder paths
that print or recreate Sarg endpoints.
My proposed fix is to keep these two semantics separate:
1. Make TimestampWithTimeZoneString.compareTo consistent with equals by
comparing the instant first and using the canonical string `v` as a tie-breaker.
2. Route SQL comparisons through an explicit SQL-value comparison helper. For
TIMESTAMP_TZ, that helper compares by instant; for other types, it keeps the
existing behavior.
3. Use that helper in simplification, interpretation, Sarg construction,
predicate/collation metadata, and Sarg serialization/printing paths, so SQL
value semantics are preserved.
I prototyped this locally with regression tests for both sides:
- TreeSet/TreeMap-style natural ordering keeps distinct values.
- SQL comparison/folding still treats two TIMESTAMP_TZ literals representing
the same instant as equal.
- SEARCH/Sarg matching still uses instant semantics.
If this direction looks reasonable, I can open a PR with the audited call-site
changes and focused tests.
> TimestampWithTimeZoneString.compareTo() makes TreeSet/TreeMap drop distinct
> timestamp-with-zone values
> ------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7553
> URL: https://issues.apache.org/jira/browse/CALCITE-7553
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Ruiqi Dong
> Priority: Minor
>
> *Summary*
> TimestampWithTimeZoneString exposes a natural ordering that can silently drop
> distinct values from TreeSet and TreeMap. equals() and hashCode() use the
> canonical string form, including the local timestamp text and the zone ID.
> compareTo(), however, compares only the parsed Calendar instant. As a result,
> two distinct values such as:
> * 1969-07-21 02:56:15 GMT-08:00
> * 1969-07-21 10:56:15 GMT
> are not equal as objects, but compare as equal in the natural ordering. Any
> sorted collection keyed on this class can therefore silently discard one of
> them.
>
> *Affected code*
> File:
> core/src/main/java/org/apache/calcite/util/TimestampWithTimeZoneString.java
> {code:java}
> @Override public boolean equals(@Nullable Object o) {
> return o == this
> || o instanceof TimestampWithTimeZoneString
> && ((TimestampWithTimeZoneString) o).v.equals(v);
> }
> @Override public int hashCode() {
> return v.hashCode();
> }
> @Override public int compareTo(TimestampWithTimeZoneString o) {
> return this.pt.getCalendar().compareTo(o.pt.getCalendar());
> } {code}
> Reproducer
> Add the following test to:
> core/src/test/java/org/apache/calcite/rex/RexBuilderTest.java
> {code:java}
> @Test void
> testTimestampWithTimeZoneStringNaturalOrderingKeepsDistinctValues() {
> final TimestampWithTimeZoneString pst =
> new TimestampWithTimeZoneString("1969-07-21 02:56:15 GMT-08:00");
> final TimestampWithTimeZoneString gmt =
> new TimestampWithTimeZoneString("1969-07-21 10:56:15 GMT");
> assertFalse(pst.equals(gmt));
> final TreeSet<TimestampWithTimeZoneString> values = new TreeSet<>();
> values.add(pst);
> values.add(gmt);
> assertThat(values, hasSize(2));
> } {code}
> Run:
> {code:java}
> ./gradlew :core:test \
> --tests
> org.apache.calcite.rex.RexBuilderTest.testTimestampWithTimeZoneStringNaturalOrderingKeepsDistinctValues{code}
> Observed behavior:
> The test fails because the second distinct value is dropped by the natural
> ordering
> {code:java}
> Expected: a collection with size <2>
> but: collection size was <1> {code}
> Expected behavior:
> If two TimestampWithTimeZoneString instances are not equal as values, the
> natural ordering should not collapse them into a single sorted-set or
> sorted-map key.
>
> Even if ordering by instant is intentional, exposing it as the class's
> natural ordering while equals() preserves zone-qualified text is unsafe. In
> practice, sorted collections keyed on this class deduplicate distinct
> timestamp-with-zone literals that happen to denote the same instant.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)