GitHub user vikrantmehta123 created a discussion: DataFusion-Federation: Union 
Flattening Across Executors

I was experimenting with union pushdown behavior in DataFusion Federation and 
noticed something I wanted to clarify.

Setup:

* Two independent SQL executors (separate DB containers)
* Executor 1 exposes tables `t1`, `t2`
* Executor 2 exposes table `t3`

The initial logical plan looks like:

```text
Projection
  Union
    Union
      TableScan(t1)
      TableScan(t2)
    TableScan(t3)
```

With `optimize_unions` enabled, this is flattened into:

```text
Projection
  Union
    TableScan(t1)
    TableScan(t2)
    TableScan(t3)
```

This results in three independent SELECTs being executed, with the UNION 
performed in-memory.

If I disable `optimize_unions` optimizer rule, Executor 1 instead receives a 
single pushed-down SQL query:

```sql
SELECT ... FROM t1
UNION ALL
SELECT ... FROM t2
```

while Executor 2 executes its own SELECT, followed by one final in-memory UNION.

---

Is flattening unions across executor boundaries here an intentional optimizer 
choice? Would pushing down UNIONs be better?


GitHub link: https://github.com/apache/datafusion/discussions/22168

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to