On Thu, 4 Apr 2024 at 13:08, Melih Mutlu <m.melihmu...@gmail.com> wrote: > I changed internal_flush() to an inline function, results look better this > way.
It seems you also change internal_flush_buffer to be inline (but only in the actual function definition, not declaration at the top). I don't think inlining internal_flush_buffer should be necessary to avoid the perf regressions, i.e. internal_flush is adding extra indirection compared to master and is only a single line, so that one makes sense to inline. Other than that the code looks good to me. The new results look great. One thing that is quite interesting about these results is that increasing the buffer size results in even better performance (by quite a bit). I don't think we can easily choose a perfect number, as it seems to be a trade-off between memory usage and perf. But allowing people to configure it through a GUC like in your second patchset would be quite useful I think, especially because larger buffers could be configured for connections that would benefit most for it (e.g. replication connections or big COPYs). I think your add-pq_send_buffer_size-GUC.patch is essentially what we would need there but it would need some extra changes to actually be merge-able: 1. needs docs 2. rename PQ_SEND_BUFFER_SIZE (at least make it not UPPER_CASE, but maybe also remove the PQ_ prefix) 3. It's marked as PGC_USERSET, but there's no logic to grow/shrink it after initial allocation