Re: [PR] [flink] Support Column Statistics for Flink [paimon]

via GitHub Wed, 23 Oct 2024 23:51:51 -0700


tsreaper commented on code in PR #4330:
URL: https://github.com/apache/paimon/pull/4330#discussion_r1814372196



##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/source/DataTableSource.java:
##########
@@ -113,8 +120,22 @@ public TableStats reportStatistics() {
         if (streaming) {
             return TableStats.UNKNOWN;
         }
-
         scanSplitsForInference();
+        Optional<Statistics> optionStatistics = table.statistics();
+        if (optionStatistics.isPresent()) {
+            Statistics statistics = optionStatistics.get();
+            if (statistics.mergedRecordCount().isPresent()) {
+                Map<String, ColumnStats> flinkColStats =
+                        statistics.colStats().entrySet().stream()
+                                .map(
+                                        entry ->
+                                                new AbstractMap.SimpleEntry<>(
+                                                        entry.getKey(),
+                                                        
toFlinkColumnStats(entry.getValue())))
+                                .collect(Collectors.toMap(Map.Entry::getKey, 
Map.Entry::getValue));
+                return new 
TableStats(statistics.mergedRecordCount().getAsLong(), flinkColStats);
+            }
+        }
         return new TableStats(splitStatistics.totalRowCount());

Review Comment:
   ```suggestion
           Optional<Statistics> optionStatistics = table.statistics();
           if (optionStatistics.isPresent()) {
               Statistics statistics = optionStatistics.get();
               if (statistics.mergedRecordCount().isPresent()) {
                   Map<String, ColumnStats> flinkColStats =
                           statistics.colStats().entrySet().stream()
                                   .map(
                                           entry ->
                                                   new 
AbstractMap.SimpleEntry<>(
                                                           entry.getKey(),
                                                           
toFlinkColumnStats(entry.getValue())))
                                   .collect(Collectors.toMap(Map.Entry::getKey, 
Map.Entry::getValue));
                   return new 
TableStats(statistics.mergedRecordCount().getAsLong(), flinkColStats);
               }
           }
   
           scanSplitsForInference();
           return new TableStats(splitStatistics.totalRowCount());
   ```
   
   There is no need to scan splits for statistics if we can read statistics 
directly from snapshot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [flink] Support Column Statistics for Flink [paimon]

Reply via email to