Norio Akagi created SPARK-57064:
-----------------------------------
Summary: Bucketing rules should match on FileSourceScanLike trait
instead of FileSourceScanExec
Key: SPARK-57064
URL: https://issues.apache.org/jira/browse/SPARK-57064
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 5.0.0
Reporter: Norio Akagi
**Summary:** `Bucketing rules should match on FileSourceScanLike trait instead
of FileSourceScanExec`
**Description:**
### What changes were proposed in this pull request?
`DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on
the concrete class `FileSourceScanExec` in several read-only match sites where
only trait-level fields (`bucketedScan`, `relation`,
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait
already declares all of these fields, so the matches can safely be widened.
This PR changes 3 match sites from `FileSourceScanExec` to `FileSourceScanLike`:
- `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence check
- `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
- `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction
Two match sites that call `.copy()` (a case-class-specific method) are
intentionally left on `FileSourceScanExec`.
### Why are the changes needed?
Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace
`FileSourceScanExec` with their own scan operators that extend
`FileSourceScanLike`. With the current concrete-class matches, these plugins'
scan operators are invisible to the bucketing rules —
`DisableUnnecessaryBucketedScan` never finds them and `ExtractJoinWithBuckets`
never extracts their bucket specs.
This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE
hardcoding concrete classes instead of traits), but in the bucketing physical
rules which were not covered by those fixes.
### Does this PR introduce _any_ user-facing change?
No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now
be recognized by the bucketing rules.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]