jayzhan211 opened a new issue, #12905:
URL: https://github.com/apache/datafusion/issues/12905
### Describe the bug
When we create table inside slt, it is created with MemoryExec (Memory
table). But it seems like the partitions is always one. If we want to create
table with multiple partitions, we need to create multiple table and union with
them.
### To Reproduce
```
statement ok
create table t1(a int, b varchar) as values
(1, 'a'),
(1, 'a'),
(1, 'a');
statement ok
create table t2(a int, b varchar) as values
(2, 'b'),
(2, 'b'),
(2, 'b');
query TI
With T as
(select * from t1 UNION ALL select * from t2)
select b, sum(DISTINCT a) from T group by b;
----
a 1
b 2
query TT
explain With T as
(select * from t1 UNION ALL select * from t2)
select b, sum(DISTINCT a) from T group by b;
----
logical_plan
01)Projection: t.b, sum(alias1) AS sum(DISTINCT t.a)
02)--Aggregate: groupBy=[[t.b]], aggr=[[sum(alias1)]]
03)----Aggregate: groupBy=[[t.b, CAST(t.a AS Int64) AS alias1]], aggr=[[]]
04)------SubqueryAlias: t
05)--------Union
06)----------TableScan: t1 projection=[a, b]
07)----------TableScan: t2 projection=[a, b]
physical_plan
01)ProjectionExec: expr=[b@0 as b, sum(alias1)@1 as sum(DISTINCT t.a)]
02)--AggregateExec: mode=FinalPartitioned, gby=[b@0 as b], aggr=[sum(alias1)]
03)----CoalesceBatchesExec: target_batch_size=8192
04)------RepartitionExec: partitioning=Hash([b@0], 4), input_partitions=4
05)--------AggregateExec: mode=Partial, gby=[b@0 as b], aggr=[sum(alias1)]
06)----------AggregateExec: mode=FinalPartitioned, gby=[b@0 as b, alias1@1
as alias1], aggr=[]
07)------------CoalesceBatchesExec: target_batch_size=8192
08)--------------RepartitionExec: partitioning=Hash([b@0, alias1@1], 4),
input_partitions=4
09)----------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=2
10)------------------AggregateExec: mode=Partial, gby=[b@1 as b, CAST(a@0 AS
Int64) as alias1], aggr=[]
11)--------------------UnionExec
12)----------------------MemoryExec: partitions=1, partition_sizes=[1]
13)----------------------MemoryExec: partitions=1, partition_sizes=[1]
statement ok
set datafusion.execution.target_partitions = 2
statement ok
create table t(a int, b varchar) as values
(1, 'a'),
(1, 'a'),
(1, 'a'),
(2, 'b'),
(2, 'b'),
(2, 'b');
query TI
select b, sum(DISTINCT a) from T group by b;
----
b 2
a 1
query TT
explain select b, sum(DISTINCT a) from T group by b;
----
logical_plan
01)Projection: t.b, sum(alias1) AS sum(DISTINCT t.a)
02)--Aggregate: groupBy=[[t.b]], aggr=[[sum(alias1)]]
03)----Aggregate: groupBy=[[t.b, CAST(t.a AS Int64) AS alias1]], aggr=[[]]
04)------TableScan: t projection=[a, b]
physical_plan
01)ProjectionExec: expr=[b@0 as b, sum(alias1)@1 as sum(DISTINCT t.a)]
02)--AggregateExec: mode=FinalPartitioned, gby=[b@0 as b], aggr=[sum(alias1)]
03)----CoalesceBatchesExec: target_batch_size=8192
04)------RepartitionExec: partitioning=Hash([b@0], 2), input_partitions=2
05)--------AggregateExec: mode=Partial, gby=[b@0 as b], aggr=[sum(alias1)]
06)----------AggregateExec: mode=FinalPartitioned, gby=[b@0 as b, alias1@1
as alias1], aggr=[]
07)------------CoalesceBatchesExec: target_batch_size=8192
08)--------------RepartitionExec: partitioning=Hash([b@0, alias1@1], 2),
input_partitions=2
09)----------------RepartitionExec: partitioning=RoundRobinBatch(2),
input_partitions=1
10)------------------AggregateExec: mode=Partial, gby=[b@1 as b, CAST(a@0 AS
Int64) as alias1], aggr=[]
11)--------------------MemoryExec: partitions=1, partition_sizes=[1] <- It
is still 1
```
### Expected behavior
I hope we can create arbitrary partitions with setting instead of creating
the test with union.
### Additional context
I think batch size doesn't not change the MemoryExec too, but I'm not sure
about the intended behaviour
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]