On 7/4/23 23:53, Matthias van de Meent wrote:
> On Thu, 8 Jun 2023 at 14:55, Tomas Vondra <tomas.von...@enterprisedb.com> 
> wrote:
>>
>> Hi,
>>
>> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The
>> infrastructure (starting workers etc.) is "inspired" by the BTREE code
>> (i.e. copied from that and massaged a bit to call brin stuff).
> 
> Nice work.
> 
>> In both cases _brin_end_parallel then reads the summaries from worker
>> files, and adds them into the index. In 0001 this is fairly simple,
>> although we could do one more improvement and sort the ranges by range
>> start to make the index nicer (and possibly a bit more efficient). This
>> should be simple, because the per-worker results are already sorted like
>> that (so a merge sort in _brin_end_parallel would be enough).
> 
> I see that you manually built the passing and sorting of tuples
> between workers, but can't we use the parallel tuplesort
> infrastructure for that? It already has similar features in place and
> improves code commonality.
> 

Maybe. I wasn't that familiar with what parallel tuplesort can and can't
do, and the little I knew I managed to forget since I wrote this patch.
Which similar features do you have in mind?

The workers are producing the results in "start_block" order, so if they
pass that to the leader, it probably can do the usual merge sort.

>> For 0002 it's a bit more complicated, because with a single parallel
>> scan brinbuildCallbackParallel can't decide if a range is assigned to a
>> different worker or empty. And we want to generate summaries for empty
>> ranges in the index. We could either skip such range during index build,
>> and then add empty summaries in _brin_end_parallel (if needed), or add
>> them and then merge them using "union".
>>
>>
>> I just realized there's a third option to do this - we could just do
>> regular parallel scan (with no particular regard to pagesPerRange), and
>> then do "union" when merging results from workers. It doesn't require
>> the sequence of TID scans, and the union would also handle the empty
>> ranges. The per-worker results might be much larger, though, because
>> each worker might produce up to the "full" BRIN index.
> 
> Would it be too much effort to add a 'min_chunk_size' argument to
> table_beginscan_parallel (or ParallelTableScanDesc) that defines the
> minimum granularity of block ranges to be assigned to each process? I
> think that would be the most elegant solution that would require
> relatively little effort: table_block_parallelscan_nextpage already
> does parallel management of multiple chunk sizes, and I think this
> modification would fit quite well in that code.
> 

I'm confused. Isn't that pretty much exactly what 0002 does? I mean,
that passes pagesPerRange to table_parallelscan_initialize(), so that
each pagesPerRange is assigned to a single worker.

The trouble I described above is that the scan returns tuples, and the
consumer has no idea what was the chunk size or how many other workers
are there. Imagine you get a tuple from block 1, and then a tuple from
block 1000. Does that mean that the blocks in between are empty or that
they were processed by some other worker?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to