mbutrovich opened a new issue, #3510:
URL: https://github.com/apache/datafusion-comet/issues/3510
### What is the problem the feature request solves?
I opened #3446 initially to apply a bunch of changes from #3349 to
CometNativeScan. The PR got away from me though, particularly due to DPP. I am
opening this issue to track the things I'd like to tackle in smaller chunks
rather than one giant PR:
- [ ] Per-partition serde to no longer send every `SparkFilePartition` of
tasks to every partition. This simply reduces serde overhead for large scans.
- [ ] DPP (non-AQE) for V1 operator. These runtime filters are created by
Spark's `PlanDynamicPruningFilters` and are easier for Comet support since this
rule runs before Comet's rules.
- [ ] DPP (AQE) for V1 operator. These runtime filters are created by
Spark's `PlanAdaptiveDynamicPruningFilters` and are difficult for Comet support
since this rule runs after Comet's rules. I'll summarize my learning from #3446:
- Comet's rules replace things like `BroadcastHashJoin` with
`CometBroadcastHashJoin`, which `PlanAdaptiveDynamicPruningFilters` does not
recognize.
- We can't modify Spark rules, so we could wait until after
`PlanAdaptiveDynamicPruningFilters` runs. This requires registering new Comet
rules after where they currently run. I tried to create a simple rule to defer
just `BroadcastHashJoin` replacement until later, but this became too
complicated with multiple scan implementations. I think when we pare down our
scan implementations, we can revisit a broader redesign of Comet rules in a way
that works better with AQE. We will need this for stronger Spark 4.0 support.
- [ ] CometNativeBatchScan operator. See #3481.
### Describe the potential solution
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]