Norio Akagi created SPARK-57064:
-----------------------------------

             Summary: Bucketing rules should match on FileSourceScanLike trait 
instead of FileSourceScanExec
                 Key: SPARK-57064
                 URL: https://issues.apache.org/jira/browse/SPARK-57064
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 5.0.0
            Reporter: Norio Akagi


**Summary:** `Bucketing rules should match on FileSourceScanLike trait instead 
of FileSourceScanExec`

**Description:**

### What changes were proposed in this pull request?

`DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on 
the concrete class `FileSourceScanExec` in several read-only match sites where 
only trait-level fields (`bucketedScan`, `relation`, 
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait 
already declares all of these fields, so the matches can safely be widened.

This PR changes 3 match sites from `FileSourceScanExec` to `FileSourceScanLike`:

- `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence check
- `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
- `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction

Two match sites that call `.copy()` (a case-class-specific method) are 
intentionally left on `FileSourceScanExec`.

### Why are the changes needed?

Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
`FileSourceScanExec` with their own scan operators that extend 
`FileSourceScanLike`. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules — 
`DisableUnnecessaryBucketedScan` never finds them and `ExtractJoinWithBuckets` 
never extracts their bucket specs.

This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

### Does this PR introduce _any_ user-facing change?

No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is 
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now 
be recognized by the bucketing rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to