tsreaper commented on code in PR #4330: URL: https://github.com/apache/paimon/pull/4330#discussion_r1814372196
########## paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/source/DataTableSource.java: ########## @@ -113,8 +120,22 @@ public TableStats reportStatistics() { if (streaming) { return TableStats.UNKNOWN; } - scanSplitsForInference(); + Optional<Statistics> optionStatistics = table.statistics(); + if (optionStatistics.isPresent()) { + Statistics statistics = optionStatistics.get(); + if (statistics.mergedRecordCount().isPresent()) { + Map<String, ColumnStats> flinkColStats = + statistics.colStats().entrySet().stream() + .map( + entry -> + new AbstractMap.SimpleEntry<>( + entry.getKey(), + toFlinkColumnStats(entry.getValue()))) + .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); + return new TableStats(statistics.mergedRecordCount().getAsLong(), flinkColStats); + } + } return new TableStats(splitStatistics.totalRowCount()); Review Comment: ```suggestion Optional<Statistics> optionStatistics = table.statistics(); if (optionStatistics.isPresent()) { Statistics statistics = optionStatistics.get(); if (statistics.mergedRecordCount().isPresent()) { Map<String, ColumnStats> flinkColStats = statistics.colStats().entrySet().stream() .map( entry -> new AbstractMap.SimpleEntry<>( entry.getKey(), toFlinkColumnStats(entry.getValue()))) .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); return new TableStats(statistics.mergedRecordCount().getAsLong(), flinkColStats); } } scanSplitsForInference(); return new TableStats(splitStatistics.totalRowCount()); ``` There is no need to scan splits for statistics if we can read statistics directly from snapshot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org