On Tue, Nov 24, 2020 at 7:33 AM Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > New version of the patch is attached.
I read over the comments from Andres (and Peter) suggesting that this ought to be on-the-fly configurable. Here are some thoughts on making that work with the wire protocol: If the client potentially wants to use compression at some point it should include _pq_.compression in the startup message. The value associated with _pq_.compression should be a comma-separated list of compression methods which the client understands. If the server responds with a NegotiateProtocolVersion message, then it either includes _pq_.compression (in which case the server does not support compression) or it does not (in which case the server does support compression). If no NegotiateProtocolVersion message is returned, then the server is from before November 2017 (ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed) and compression is not supported. If the client requests compression and the server supports it, it should return a new SupportedCompressionTypes message following NegotiateProtocolMessage response. That should be a list of compression methods which the server understands. At this point, the clent and the server each know what methods the other understands. Each should now feel free to select a compression method the other side understands, and to switch methods whenever desired, as long as they only select from methods the other side has said that they understand. The patch seems to think that the compression method has to be the same in both directions and that it can never change, but there's no real reason for that. Let each side start out uncompressed and then let it issue a new SetCompressionMethod protocol message to switch the compression method whenever it wants. After sending that message it begins using the new compression type. The other side doesn't have to agree. That way, you don't have to worry about synchronizing the two directions. Each side is just telling the other what is choosing to do, from among the options the other side said it could understand. It's an interesting question whether it's best to "wrap" the compressed messages in some way. For example, imagine that instead of just compressing the bytes and sending them out, you send a message of some new type CompressedMessage whose payload is another protocol message. Then a piece of middleware could decide to decompress each message just enough to see whether it wants to do anything with the message and if not just forward it. It could also choose to inject its own messages into the message stream which wouldn't necessarily need to be compressed, because the wrapper allows mixing of compressed and uncompressed messages. The big disadvantage of this approach is that in many cases it will be advantageous to compress consecutive protocol messages as a unit. For example, when the extend query protocol is in use, the client will commonly send P-B-D-E-S maybe even in one network packet. It will compress better if all of those messages are compressed as one rather than separately. That consideration argues for the approach the patch actually takes (though the documentation isn't really very clear about what the patch is actually doing here so I might be misunderstanding). However, note that when the client or server does a "flush" this requires "flushing" at the compression layer also, so that all pending data can be sent. zlib can certainly do that; I assume other algorithms can too, but I don't really know. If there are algorithms that don't have that built in, this approach might be an issue. Another thing to think about is whether it's really beneficial to compress stuff like P-B-D-E-S. I would guess that the benefits here come mostly from compressing DataRow and CopyData messages, and that compressing messages that don't contain much payload data may just be a waste of CPU cycles. On the other hand, it's not impossible for it to win: the query might be long, and query text is probably highly compressible. Designing something that is specific to certain message types is probably a bridge too far, but at least I think this is a strong argument that the compression method shouldn't have to be the same in both directions. -- Robert Haas EDB: http://www.enterprisedb.com