Hi hackers,
I write a path for soupport parallel distinct, union and aggregate using batch 
 1. generate hash value for group clauses values, and using mod hash value save 
to batch
 2. end of outer plan, wait all other workers finish write to batch
 3. echo worker get a unique batch number, call tuplesort_performsort() 
function finish this batch sort
 4. return row for this batch
 5. if not end of all batchs, got step 3

BatchSort paln make sure same tuple(group clause) return in same range, so 
Unique(or GroupAggregate) plan can work.

path 2 for parallel aggregate, this is a simple use
but regress failed for partitionwise aggregation difference plan
from GatherMerge->Sort->Append->...
to  Sort->Gahter->Append->...
I have no idea how to modify it.

Same idea I writed a batch shared tuple store for HashAgg in our PG version, I 
will send patch for PG14 when I finish it.

The following is a description in Chinese
 1. 先按group clause计算出hash值,并按取模的值放入不同的批次
 2. 当下层plan返回所有的行后,等待所有其它的工作进程结束
 3. 每一个工作进程索取一个唯一的一个批次, 并调用tuplesort_performsort()函数完成最终排序
 4. 返回本批次的所有行
 5. 如果所有的批次没有读完,则返回第3步
BatchSort plan能保证相同的数据(按分给表达式)在同一个周期内返回,所以几个去重和分组相关的plan可以正常工作。
这个补丁导致了regress测试中的partitionwise aggregation失败,原来的执行计划有所变更。
补丁只写了一个简单的使用BatchSort plan的方法,可能还需要添加其它用法。

用同样的思想我写了一个使用shared tuple store的HashAgg在我们的AntDB版本中(最新版本暂未开源),适配完PG14版本后我会发出来。
打个广告:欢迎关注我们亚信公司基于PG的分布式数据库产品AntDB,开源地址 https://github.com/ADBSQL/AntDB


Attachment: 0001-Parallel-distinct-and-union-support.patch
Description: Binary data

Attachment: 0002-Parallel-aggregate-support-using-batch-sort.patch
Description: Binary data

Reply via email to