[ 
https://issues.apache.org/jira/browse/SPARK-57064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norio Akagi updated SPARK-57064:
--------------------------------
    Description: 
### What changes were proposed in this pull request?

  `DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match on 
the concrete class `FileSourceScanExec` in several read-only match sites where 
only trait-level fields (`bucketedScan`, `relation`, 
`optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait 
already declares
  all of these fields, so the matches can safely be widened.

  This PR changes 3 match sites from `FileSourceScanExec` to 
`FileSourceScanLike`:

  - `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence 
check
  - `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence check
  - `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction

  Two match sites that call `.copy()` (a case-class-specific method) are 
intentionally left on `FileSourceScanExec`.

  ### Why are the changes needed?

  Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
`FileSourceScanExec` with their own scan operators that extend 
`FileSourceScanLike`. With the current concrete-class matches, these plugins' 
scan operators are invisible to the bucketing rules — 
`DisableUnnecessaryBucketedScan`
  never finds them and `ExtractJoinWithBuckets` never extracts their bucket 
specs.

  This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

  ### Does this PR introduce _any_ user-facing change?

  No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior is 
unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will now 
be recognized by the bucketing rules.

 

  was:
  h3. What changes were proposed in this pull request?

  \{{DisableUnnecessaryBucketedScan}} and \{{CoalesceBucketsInJoin}} 
pattern-match on the concrete class \{{FileSourceScanExec}} in several 
read-only match sites where only trait-level fields (\{{bucketedScan}}, 
\{{relation}}, \{{optionalNumCoalescedBuckets}}) are accessed. The 
\{{FileSourceScanLike}} trait
  already declares all of these fields, so the matches can safely be widened.

  This PR changes 3 match sites from \{{FileSourceScanExec}} to 
\{{FileSourceScanLike}}:

  - \{{DisableUnnecessaryBucketedScan.apply}} — the \{{hasBucketedScan}} 
existence check
  - \{{ExtractJoinWithBuckets.hasScanOperation}} — the bucket spec existence 
check
  - \{{ExtractJoinWithBuckets.getBucketSpec}} — the bucket spec extraction

  Two match sites that call \{{.copy()}} (a case-class-specific method) are 
intentionally left on \{{FileSourceScanExec}}.

  h3. Why are the changes needed?

  Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
\{{FileSourceScanExec}} with their own scan operators that extend 
\{{FileSourceScanLike}}. With the current concrete-class matches, these 
plugins' scan operators are invisible to the bucketing rules —
  \{{DisableUnnecessaryBucketedScan}} never finds them and 
\{{ExtractJoinWithBuckets}} never extracts their bucket specs.

  This is the same class of issue addressed by SPARK-32332 and SPARK-32430 (AQE 
hardcoding concrete classes instead of traits), but in the bucketing physical 
rules which were not covered by those fixes.

  h3. Does this PR introduce any user-facing change?

  No. \{{FileSourceScanExec}} already extends \{{FileSourceScanLike}}, so 
behavior is unchanged for vanilla Spark. Plugins that extend 
\{{FileSourceScanLike}} will now be recognized by the bucketing rules.


> Bucketing rules should match on FileSourceScanLike trait instead of 
> FileSourceScanExec
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-57064
>                 URL: https://issues.apache.org/jira/browse/SPARK-57064
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Norio Akagi
>            Priority: Minor
>
> ### What changes were proposed in this pull request?
>   `DisableUnnecessaryBucketedScan` and `CoalesceBucketsInJoin` pattern-match 
> on the concrete class `FileSourceScanExec` in several read-only match sites 
> where only trait-level fields (`bucketedScan`, `relation`, 
> `optionalNumCoalescedBuckets`) are accessed. The `FileSourceScanLike` trait 
> already declares
>   all of these fields, so the matches can safely be widened.
>   This PR changes 3 match sites from `FileSourceScanExec` to 
> `FileSourceScanLike`:
>   - `DisableUnnecessaryBucketedScan.apply` — the `hasBucketedScan` existence 
> check
>   - `ExtractJoinWithBuckets.hasScanOperation` — the bucket spec existence 
> check
>   - `ExtractJoinWithBuckets.getBucketSpec` — the bucket spec extraction
>   Two match sites that call `.copy()` (a case-class-specific method) are 
> intentionally left on `FileSourceScanExec`.
>   ### Why are the changes needed?
>   Third-party columnar execution plugins (Gluten, Comet, RAPIDS) replace 
> `FileSourceScanExec` with their own scan operators that extend 
> `FileSourceScanLike`. With the current concrete-class matches, these plugins' 
> scan operators are invisible to the bucketing rules — 
> `DisableUnnecessaryBucketedScan`
>   never finds them and `ExtractJoinWithBuckets` never extracts their bucket 
> specs.
>   This is the same class of issue addressed by SPARK-32332 and SPARK-32430 
> (AQE hardcoding concrete classes instead of traits), but in the bucketing 
> physical rules which were not covered by those fixes.
>   ### Does this PR introduce _any_ user-facing change?
>   No. `FileSourceScanExec` already extends `FileSourceScanLike`, so behavior 
> is unchanged for vanilla Spark. Plugins that extend `FileSourceScanLike` will 
> now be recognized by the bucketing rules.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to