github-actions[bot] commented on code in PR #64648:
URL: https://github.com/apache/doris/pull/64648#discussion_r3496623129
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -1150,20 +1150,46 @@ public long getCountFromSnapshot() throws UserException
{
return 0;
}
- Map<String, String> summary = snapshot.summary();
- if (!summary.get(IcebergUtils.TOTAL_EQUALITY_DELETES).equals("0")) {
+ return getCountFromSummary(snapshot.summary(),
sessionVariable.ignoreIcebergDanglingDelete);
+ }
+
+ /**
+ * Decide whether the row count can be pushed down from an Iceberg snapshot
+ * summary, returning the count, or -1 if it cannot be pushed down (the
+ * caller then falls back to a normal scan).
+ *
+ * <p>A snapshot summary is NOT guaranteed to carry the {@code total-*}
+ * counters: snapshots produced by compaction/replace (and some writers)
+ * may omit {@code total-equality-deletes} / {@code total-position-deletes}
+ * / {@code total-records}. Previously this method called
+ * {@code summary.get(...).equals(...)} / {@code Long.parseLong(...)}
+ * directly on the map values, which threw a {@link NullPointerException}
+ * (e.g. "Cannot invoke String.equals because Map.get(...) is null") for
+ * {@code SELECT COUNT(*)} on such a table while {@code SELECT *} worked.
+ * When any required counter is absent we now fall back to a scan.
+ */
+ @VisibleForTesting
+ public static long getCountFromSummary(Map<String, String> summary,
boolean ignoreDanglingDelete) {
+ String equalityDeletes =
summary.get(IcebergUtils.TOTAL_EQUALITY_DELETES);
+ String positionDeletes =
summary.get(IcebergUtils.TOTAL_POSITION_DELETES);
+ String totalRecords = summary.get(IcebergUtils.TOTAL_RECORDS);
+ if (equalityDeletes == null || positionDeletes == null || totalRecords
== null) {
+ // summary is missing a required counter, can not push down count
+ return -1;
+ }
+ if (!equalityDeletes.equals("0")) {
// has equality delete files, can not push down count
return -1;
}
- long deleteCount =
Long.parseLong(summary.get(IcebergUtils.TOTAL_POSITION_DELETES));
+ long deleteCount = Long.parseLong(positionDeletes);
if (deleteCount == 0) {
// no delete files, can push down count directly
- return Long.parseLong(summary.get(IcebergUtils.TOTAL_RECORDS));
+ return Long.parseLong(totalRecords);
}
- if (sessionVariable.ignoreIcebergDanglingDelete) {
+ if (ignoreDanglingDelete) {
// has position delete files, if we ignore dangling delete, can
push down count
- return Long.parseLong(summary.get(IcebergUtils.TOTAL_RECORDS)) -
deleteCount;
+ return Long.parseLong(totalRecords) - deleteCount;
Review Comment:
When this subtraction returns `0` (for example `total-records ==
total-position-deletes` with `ignore_iceberg_dangling_delete=true`), FE still
marks the scan as table-level COUNT pushdown because `isBatchMode()` accepts
any `countFromSnapshot >= 0` and `assignCountToSplits()` sends a split row
count of `0`. On BE, though, `FileScanner` treats `table_level_row_count >= 0`
as table-level count, while `IcebergReaderMixin::_init_row_filters()` only
preserves the COUNT fast path for `table_level_row_count > 0`; with zero and
delete descriptors it initializes the delete readers and resets the reader
pushdown op to `NONE`, so the range never becomes the intended `CountReader(0)`
path. Please either make the BE guard match the `>= 0` table-level count
contract, or have this FE path fall back instead of returning zero until BE
handles zero-count table-level pushdown consistently. A unit case for
`summary("0", "N", "N")` with `ignoreDanglingDelete=true` would cover this
boundary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]