On Tue, 9 Jul 2024 at 03:18, Andy Fan <zhihuifan1...@163.com> wrote: >> and later we called 'tuplesort_performsort(state->bs_sortstate);'. Even >> we have some CTID merges activity in '....(1)', the tuples are still >> ordered, so the sort (in both tuplesort_putgintuple and >> 'tuplesort_performsort) are not necessary, what's more, in the each of >> 'flush-memory-to-disk' in tuplesort, it create a 'sorted-run', and in >> this case, acutally we only need 1 run only since all the input tuples >> in the worker is sorted. The reduction of 'sort-runs' in worker will be >> helpful to leader's final mergeruns. the 'sorted-run' benefit doesn't >> exist for the case-1 (RBTree -> worker_state). >> >> If Matthias's proposal is adopted, my optimization will not be useful >> anymore and Matthias's porposal looks like a more natural and effecient >> way.
I think they might be complementary. I don't think it's reasonable to expect GIN's BuildAccumulator to buffer all the index tuples at the same time (as I mentioned upthread: we are or should be limited by work memory), but the BuildAccumulator will do a much better job at combining tuples than the in-memory sort + merge-write done by Tuplesort (because BA will use (much?) less memory for the same number of stored values). So, the idea of making BuildAccumulator responsible for providing the initial sorted runs does resonate with me, and can also be worth pursuing. I think it would indeed save time otherwise spent comparing if tuples can be merged before they're first spilled to disk, when we already have knowledge about which tuples are a sorted run. Afterwards, only the phases where we merge sorted runs from disk would require my buffered write approach that merges Gin tuples. Kind regards, Matthias van de Meent Neon (https://neon.tech)