[
https://issues.apache.org/jira/browse/FLINK-38555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032657#comment-18032657
]
yuanfenghu commented on FLINK-38555:
------------------------------------
After my careful analysis, this should be a serious bug, because when Debezium
serializes the binlog event, the TIMESTAMP field will be converted to the Long
type, so the logic of comparing long and localDatetime will be triggered in
compareObjects. This is obviously wrong.
> Optimize performance of `RecordUtils.compareObjects()` method by avoiding
> unnecessary `toString()` calls for temporal types (LocalDateTime, LocalDate,
> Instant, etc.).
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-38555
> URL: https://issues.apache.org/jira/browse/FLINK-38555
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.5.0
> Reporter: yuanfenghu
> Priority: Critical
> Fix For: cdc-3.6.0
>
> Attachments: image-2025-10-24-10-15-18-027.png,
> image-2025-10-24-10-15-37-328.png
>
>
> h2. Background
> While analyzing flame graphs of a Flink CDC MySQL source job, I identified
> that `RecordUtils.splitKeyRangeContains()` was a performance bottleneck.
> Further investigation revealed that `compareObjects()` was using `toString()`
> to compare temporal objects, which is significantly slower than direct
> comparison.
>
> h3. Root Cause
> h3.
> In the current implementation:
> {code:java}
> private static int compareObjects(Object o1, Object o2) {
> if (o1 instanceof Comparable && o1.getClass().equals(o2.getClass())) {
> return ((Comparable) o1).compareTo(o2);
> } else if (isNumericObject(o1) && isNumericObject(o2)) {
> return toBigDecimal(o1).compareTo(toBigDecimal(o2));
> } else {
> return o1.toString().compareTo(o2.toString());
> }
> }{code}
> When comparing `LocalDateTime` objects, the first condition fails if the
> objects are cast to `Object`, falling through to the `toString()` comparison
> path.
> h3. Impact
> This method is called extensively during the snapshot phase when evaluating
> whether binlog records fall within completed split ranges. For tables with:
> - Temporal types (DATETIME, TIMESTAMP, DATE, TIME) as chunk keys
> - High binlog throughput during snapshot phase
> - Many splits (large tables with small chunk size)
> The performance impact can be significant (80% CPU in some cases).
> !image-2025-10-24-10-15-37-328.png!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)