GitHub user vikrantmehta123 created a discussion: DataFusion-Federation: Union
Flattening Across Executors
I was experimenting with union pushdown behavior in DataFusion Federation and
noticed something I wanted to clarify.
Setup:
* Two independent SQL executors (separate DB containers)
* Executor 1 exposes tables `t1`, `t2`
* Executor 2 exposes table `t3`
The initial logical plan looks like:
```text
Projection
Union
Union
TableScan(t1)
TableScan(t2)
TableScan(t3)
```
With `optimize_unions` enabled, this is flattened into:
```text
Projection
Union
TableScan(t1)
TableScan(t2)
TableScan(t3)
```
This results in three independent SELECTs being executed, with the UNION
performed in-memory.
If I disable `optimize_unions` optimizer rule, Executor 1 instead receives a
single pushed-down SQL query:
```sql
SELECT ... FROM t1
UNION ALL
SELECT ... FROM t2
```
while Executor 2 executes its own SELECT, followed by one final in-memory UNION.
---
Is flattening unions across executor boundaries here an intentional optimizer
choice? Would pushing down UNIONs be better?
GitHub link: https://github.com/apache/datafusion/discussions/22168
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]