On Thu, Nov 26, 2020 at 8:15 AM Daniil Zakhlystov <usernam...@yandex-team.ru> wrote: > However, I don’t mean by this that we shouldn’t support switchable > compression. I propose that we can offer two compression modes: permanent > (which is implemented in the current state of the patch) and switchable > on-the-fly. Permanent compression allows us to deliver a robust solution that > is already present in some databases. Switchable compression allows us to > support more complex scenarios in cases when the frontend and backend really > need it and can afford development effort to implement it.
I feel that one thing that may be getting missed here is that my suggestions were intended to make this simpler, not more complicated. Like, in the design I proposed, switchable compression is not a separate form of compression and doesn't require any special support. Both sides are just allowed to set the compression method; theoretically, they could set it more than once. Similarly, I don't intend the possibility of using different compression algorithms in the two directions as a request for an advanced feature so much as a way of simplifying the protocol. Like, in the protocol that you proposed previously, you've got a four-phase handshake to set up compression. The startup packet carries initial information from the client, and the server then sends CompressionAck, and then the client sends SetCompressionMethod, and then the server sends SetCompressionMethod. This system is fairly complex, and it requires some form of interlocking. Once the client has sent a SetCompressionMethod message, it cannot send any other protocol message until it receives a SetCompressionMethod message back from the server. Otherwise, it doesn't know whether the server actually responded with SetCompressionMethod as well, or whether it sent say ErrorResponse or NoticeResponse or something. In the former case it needs to send compressed data going forward; in the latter uncompressed; but it can't know which until it seems the server message. And keep in mind that control isn't necessarily with libpq at this point, because non-blocking mode could be in use. This is all solvable, but the way I proposed it, you don't have that problem. You never need to wait for a message from the other end before being able to send a message yourself. Similarly, allowing different compression methods in the two directions may seem to make things more complicated, but I don't think it really is. Arguably it's simpler. The instant the server gets the startup packet, it can issue SetCompressionMethod. The instant the client gets SupportedCompressionTypes, it can issue SetCompressionMethod. So there's practically no hand-shaking at all. You get a single protocol message and you immediately respond by setting the compression method and then you just send compressed messages after that. Perhaps the time at which you begin receiving compressed data will be a little different than the time at which you begin sending it, or perhaps compression will only ever be used in one direction. But so what? The code really does need to care. You just need to keep track of the active compression mode in each direction, and that's it. And again, if you allow the compression method to be switched at any time, you just have to know how what to do when you get a SetCompressionMethod. If you only allow it to be changed once, to set it initially, then you have to ADD code to reject that message the next time it's sent. If that ends up avoiding a significant amount of complexity somewhere else then I don't have a big problem with it, but if it doesn't, it's simpler to allow it whenever than to restrict it to only once. > 2. List of the compression algorithms which the frontend is able to > decompress in the order of preference. > For example: > “zlib:1,3,5;zstd:7,8;uncompressed” means that frontend is able to: > - decompress zlib with 1,3 or 5 compression levels > - decompress zstd with 7 or 8 compression levels > - “uncompressed” at the end means that the frontend agrees to receive > uncompressed messages. If there is no “uncompressed” compression algorithm > specified it means that the compression is required. I think that there's no such thing as being able to decompress some compression levels with the same algorithm but not others. The level only controls behavior on the compression side. So, the client can only send zlib data if the server can decompress it, but the server need not advertise which levels it can decompress, because it's all or nothing. > Supported compression and decompression methods are configured using GUC > parameters: > > compress_algorithms = ‘...’ // default value is ‘uncompressed’ > decompress_algorithms = ‘...’ // default value is ‘uncompressed’ This raises an interesting question which I'm not quite sure about. It doesn't seem controversial to assert that the client must be able to advertise which algorithms it does and does not support, and likewise for the server. After all, just because we offer lz4, say, as an option doesn't mean every PostgreSQL build will be performed --with-lz4. But, how should the compression algorithm that actually gets used be controlled? One can imagine that the client is in charge of the compression algorithm and the compression level in both directions. If we insist on those being the same, the client says something like compression=lz4:1 and then it uses that algorithm and instructs the server to do the same; otherwise there might be separate connection parameters for client-compression and server-compression, or some kind of syntax that lets you specify both using a single parameter. On the other hand, one could take a whole different approach and imagine the server being in charge of both directions, like having a GUC that is set on the server. Clients advertise what they can support, and the server tells them to do whatever the GUC says they must. That sounds awfully heavy-handed, but it has the advantage of letting the server administrator set site policy. One can also imagine combination approaches, like letting the server GUC define the default but allowing the client to override using a connection parameter. Or even putting each side in charge of what it sends: the GUC controls the what the server tries to do, provided the client can support it; and the connection parameter controls the client behavior, provided the server can support it. I am not really sure what's best here, but it's probably something we need to think about a bit before we get too deep into this. I'm tentatively inclined to think that the server should have a GUC that defines the *allowable* compression algorithms so that the administrator can disable algorithms that are compiled into the binary but which she does not want to permit (e.g. because a security problem was discovered in a relevant library). The default can simply be 'all', meaning everything the binary supports. And then the rest of the control should be on the client side, so that the server GUC can never influence the selection of which algorithm is actually chosen, but only rule things out. But that is just a tentative opinion; maybe it's not the right idea. -- Robert Haas EDB: http://www.enterprisedb.com