On Tue, Jun 2, 2020 at 9:56 PM Andres Freund <and...@anarazel.de> wrote: > The biggest problem after that is that we waste a lot of time memcpying > stuff around repeatedly. There is: > 1) send function: datum -> per datum stringinfo > 2) printtup: per datum stringinfo -> per row stringinfo > 3) socket_putmessage: per row stringinfo -> PqSendBuffer > 4) send(): PqSendBuffer -> kernel buffer > > It's obviously hard to avoid 1) and 4) in the common case, but the > number of other copies seem pretty clearly excessive.
I too have seen recent benchmarking data where this was a big problem. Basically, you need a workload where the server doesn't have much or any actual query processing to do, but is just returning a lot of stuff to a really fast client - e.g. a locally connected client. That's not necessarily the most common case but, if you have it, all this extra copying is really pretty expensive. My first thought was to wonder about changing all of our send/output functions to write into a buffer passed as an argument rather than returning something which we then have to copy into a different buffer, but that would be a somewhat painful change, so it is probably better to first pursue the idea of getting rid of some of the other copies that happen in more centralized places (e.g. printtup). I wonder if we could replace the whole pq_beginmessage...()/pq_send....()/pq_endmessage...() system with something a bit better-designed. For instance, suppose we get rid of the idea that the caller supplies the buffer, and we move the responsibility for error recovery into the pqcomm layer. So you do something like: my_message = xyz_beginmessage('D'); xyz_sendint32(my_message, 42); xyz_endmessage(my_message); Maybe what happens here under the hood is we keep a pool of free message buffers sitting around, and you just grab one and put your data into it. When you end the message we add it to a list of used message buffers that are waiting to be sent, and once we send the data it goes back on the free list. If an error occurs after xyz_beginmessage() and before xyz_endmessage(), we put the buffer back on the free list. That would allow us to merge (2) and (3) into a single copy. To go further, we could allow send/output functions to opt in to receiving a message buffer rather than returning a value, and then we could get rid of (1) for types that participate. (4) seems unavoidable AFAIK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company