On 5/13/24 10:19, Andy Fan wrote: > > Tomas Vondra <tomas.von...@enterprisedb.com> writes: > >> ... >> >> I don't understand the question. The blocks are distributed to workers >> by the parallel table scan, and it certainly does not do that block by >> block. But even it it did, that's not a problem for this code. > > OK, I get ParallelBlockTableScanWorkerData.phsw_chunk_size is designed > for this. > >> The problem is that if the scan wraps around, then one of the TID lists >> for a given worker will have the min TID and max TID, so it will overlap >> with every other TID list for the same key in that worker. And when the >> worker does the merging, this list will force a "full" merge sort for >> all TID lists (for that key), which is very expensive. > > OK. > > Thanks for all the answers, they are pretty instructive! >
Thanks for the questions, it forces me to articulate the arguments more clearly. I guess it'd be good to put some of this into a README or at least a comment at the beginning of gininsert.c or somewhere close. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company