Re: libpq compression

Daniil Zakhlystov Tue, 08 Dec 2020 06:42:39 -0800

Hi, Robert!

First of all, thanks for your detailed reply.

> On Dec 3, 2020, at 2:23 AM, Robert Haas <[email protected]> wrote:
> 
> Like, in the protocol that you proposed previously, you've got a
> four-phase handshake to set up compression. The startup packet carries
> initial information from the client, and the server then sends
> CompressionAck, and then the client sends SetCompressionMethod, and
> then the server sends SetCompressionMethod. This system is fairly
> complex, and it requires some form of interlocking.

I proposed a slightly different handshake (three-phase):

1. At first, the client sends _pq_.compression parameter in startup packet
2. Server replies with CompressionAck and following it with 
SetCompressionMethod message.
These two might be combined but I left them like this for symmetry reasons. In 
most cases they 
will arrive as one piece without any additional delay.
3. Client replies with SetCompressionMethod message.

The handshake like above allows forbidding the uncompressed client-to-server 
or/and server-to-client communication.

For example, if the client did not explicitly specify ‘uncompressed’ in the 
supported decompression methods list, and 
the server does not support any of the other compression algorithms sent by the 
client, the server will send back 
SetCompressionMethod with ‘-1’ index. After receiving this message, the client 
will terminate the connection.

> On Dec 3, 2020, at 2:23 AM, Robert Haas <[email protected]> wrote:
> 
> And again, if you allow the compression method to be switched at any
> time, you just have to know how what to do when you get a
> SetCompressionMethod. If you only allow it to be changed once, to set
> it initially, then you have to ADD code to reject that message the
> next time it's sent. If that ends up avoiding a significant amount of
> complexity somewhere else then I don't have a big problem with it, but
> if it doesn't, it's simpler to allow it whenever than to restrict it
> to only once.

Yes, there is actually some amount of complexity involved in implementing the 
switchable on-the-fly compression. 
Currently, compression itself operates on a different level, independently of 
libpq protocol. By allowing
the compression to be switchable on the fly, we need to solve these tasks:

1. When the new portion of bytes comes to the decompressor from the 
socket.read() call, there may be
a situation when the first part of these bytes is a compressed fragment and the 
other is part is uncompressed, or worse,
in a single portion of new bytes, there may be the end of some ZLIB compressed 
message and the beginning of the ZSTD compressed message.
The problem is that we don’t know the exact end of the ZLIB compressed message 
before decompressing the entire chunk of new bytes
and reading the SetCompressionMethod message. Moreover, streaming compression 
by itself may involve some internal buffering,
which also complexifies this problem.

2. When sending the new portion of bytes, it may be not sufficient to keep 
track of only the current compression method.
There may be a situation when there could be multiple SetCompressionMessages in 
PqSendBuffer (backend) or conn->outBuffer (frontend).
It means that it is not enough to simply track the current compression method 
but also keep track of all compression method
switches in PqSendBuffer or conn->outBuffer. Also, same as for decompression,
internal buffering of streaming compression makes the situation more complex in 
this case too.

Despite that the above two problems might be solvable, I doubt if we should 
oblige to solve these problems not only in libpq,
but in all other third-party Postgres protocol libraries since the exact areas 
of application for switchable compression are not clear yet.
I agree with Konstantine’s point of view on this one:

> And more important question - if we really want to switch algorithms on 
> the fly: who and how will do it?
> Do we want user to explicitly control it (something like "\compression 
> on"  psql command)?
> Or there should be some API for application?
> How it can be supported for example by JDBC driver?
> I do not have answers for this questions...

However, as previously mentioned in the thread, it might be useful in the 
future and we should design a protocol 
that supports it so we won’t have any problems with backward compatibility.
So, basically, this was the only reason to introduce the two separate 
compression modes - switchable and permanent.

In the latest patch, Konstantin introduced the extension part. So in the future 
versions, we can introduce the switchable compression 
handling in this extension part. By now, let the permanent compression be the 
default mode.

> On Dec 3, 2020, at 2:23 AM, Robert Haas <[email protected]> wrote:
> 
> I think that there's no such thing as being able to decompress some
> compression levels with the same algorithm but not others. The level
> only controls behavior on the compression side. So, the client can
> only send zlib data if the server can decompress it, but the server
> need not advertise which levels it can decompress, because it's all or
> nothing.

Depending on the chosen compression algorithm, compression level may affect the 
decompression speed and memory usage.
That's why I think that it may be nice for the server to forbid some 
compression levels with high CPU / memory usage required for decompression.

> On Dec 3, 2020, at 2:23 AM, Robert Haas <[email protected]> wrote:
> 
> On the other hand, one could take a whole different
> approach and imagine the server being in charge of both directions,
> like having a GUC that is set on the server. Clients advertise what
> they can support, and the server tells them to do whatever the GUC
> says they must. That sounds awfully heavy-handed, but it has the
> advantage of letting the server administrator set site policy.

I personally think that this approach is the most practical one. For example:

In the server’s postgresql.conf:

compress_algorithms = ‘uncompressed' // means that the server forbids any 
server-to-client compression
decompress_algorithms = 'zstd:7,8;uncompressed' // means that the server can 
only decompress zstd with compression ratio 7 and 8 or communicate with 
uncompressed messages

In the client connection string:

“… compression=zlib:1,3,5;zstd:6,7,8;uncompressed …” // means that the client 
is able to compress/decompress zlib, zstd, or communicate with uncompressed 
messages

For the sake of simplicity, the client’s “compression” parameter in the 
connection string is basically an analog of the server’s compress_algorithms 
and decompress_algorithms.
So the negotiation process for the above example would look like this:

1. Client sends startup packet with 
“algorithms=zlib:1,3,5;zstd:6,7,8;uncompressed;”
Since there is no compression mode specified, assume that the client wants 
permanent compression.
In future versions, the client can turn request the switchable compression 
after the ‘;’ at the end of the message

2. Server replies with two messages:
- CompressionAck message containing “algorithms=zstd:7,8;uncompressed;”
Where the algorithms section basically matches the “decompress_algorithms” 
server GUC parameter.
In future versions, the server can specify the chosen compression mode after 
the ‘;’ at the end of the message

- Following SetCompressionMethod message containing “alg_idx=1;level_idx=1” 
which
essentially means that the server chose zstd with compression level 7 for 
server-to-client compression. Every next message from the server is now 
compressed with zstd

3. Client replies with SetCompressionMethod message containing “alg_idx=0” 
which means that the client chose the uncompressed
client-to-server messaging. Actually, the client had no other options, because 
the “uncompressed” was the only option left after the intersection of
compression algorithms from the connection string and algorithms received from 
the server in the CompressionAck message.
Every next message from the client is now being sent uncompressed.

—
Daniil Zakhlystov

Re: libpq compression

Reply via email to