Re: libpq compression (part 3)

2024-05-21 Thread Jacob Champion
On Tue, May 21, 2024 at 12:08 PM Jacob Burroughs
 wrote:
> I think I thought I was writing about something else when I wrote that
> :sigh:.  I think what I really should have written was a version of
> the part below, which is that we use streaming decompression, only
> decompress 8kb at a time, and for pre-auth messages limit them to
> `PG_MAX_AUTH_TOKEN_LENGTH` (65535 bytes), which isn't really enough
> data to actually cause any real-world pain by needing to decompress vs
> the equivalent pain of sending invalid uncompressed auth packets.

Okay. So it sounds like your position is similar to Robert's from
earlier: prefer allowing unauthenticated compressed packets for
simplicity, as long as we think it's safe for the server. (Personally
I still think a client that compresses its password packets is doing
it wrong, and we could help them out by refusing that.)

> We own both the canonical client and server, so those are both covered
> here.  I would think it would be the responsibility of any other
> system that maintains its own implementation of the postgres protocol
> and chooses to support the compression protocol to perform its own
> mitigations against potential compression security issues.

Sure, but if our official documentation is "here's an extremely
security-sensitive feature, figure it out!" then we've done a
disservice to the community.

> Should we
> put the fixed message size limits (that have de facto been part of the
> protocol since 2021, even if they weren't documented as such) into the
> protocol documentation?

Possibly? I don't know if the other PG-compatible implementations use
the same limits. It might be better to say "limits must exist".

> ( I don't really see how one could implement other tooling that used
> pg compression without using streaming compression, as the protocol
> never hands over a standalone blob of compressed data: all compressed
> data is always part of a stream, but even with streaming decompression
> you still need some kind of limits or you will just chew up memory.)

Well, that's a good point; I wasn't thinking about the streaming APIs
themselves. If the easiest way to implement decompression requires the
use of an API that shouts "hey, give me guardrails!", then that helps
quite a bit. I really need to look into the attack surface of the
three algorithms.

--Jacob




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Tue, May 21, 2024 at 1:24 PM Jacob Champion
 wrote:
>
> We absolutely have to document the risks and allow clients to be
> written safely. But I think server-side controls on risky behavior
> have proven to be generally more valuable, because the server
> administrator is often in a better spot to see the overall risks to
> the system. ("No, you will not use deprecated ciphersuites. No, you
> will not access this URL over plaintext. No, I will not compress this
> response containing customer credit card numbers, no matter how nicely
> you ask.") There are many more clients than servers, so it's less
> risky for the server to enforce safety than to hope that every client
> is safe.
>
> Does your database and access pattern regularly mingle secrets with
> public data? Would auditing correct client use of compression be a
> logistical nightmare? Do your app developers keep indicating in
> conversations that they don't understand the risks at all? Cool, just
> set `encrypted_compression = nope_nope_nope` on the server and sleep
> soundly at night. (Ideally we would default to that.)

Thinking about this more (and adding a encrypted_compression GUC or
whatever), I think my inclination would on the server-side default
enable compression for insecure connections but default disable for
encrypted connections, but both would be config parameters that can be
changed as desired.

> The concept of stream reset seems necessary but insufficient at the
> application level, which bleeds over into Jelte's compression_restart
> proposal. (At the protocol level, I think it may be sufficient?)
>
> If I write a query where one of the WHERE clauses is
> attacker-controlled and the other is a secret, I would really like to
> not compress that query on the client side. If I join a table of user
> IDs against a table of user-provided addresses and a table of
> application tokens for that user, compressing even a single row leaks
> information about those tokens -- at a _very_ granular level -- and I
> would really like the server not to do that.
>
> So if I'm building sand castles... I think maybe it'd be nice to mark
> tables (and/or individual columns?) as safe for compression under
> encryption, whether by row or in aggregate. And maybe libpq and psql
> should be able to turn outgoing compression on and off at will.
>
> And I understand those would balloon the scope of the feature. I'm
> worried I'm doing the security-person thing and sucking all the air
> out of the room. I know not everybody uses transport encryption; for
> those people, compress-it-all is probably a pretty winning strategy,
> and there's no need to reset the compression context ever. And the
> pg_dump-style, "give me everything" use case seems like it could maybe
> be okay, but I really don't know how to assess the risk there, at all.

I would imagine that a large volume of uses of postgres are in
contexts (e.g. internal networks) where either no encryption is used
or even when encryption is used the benefit of compression vs the risk
of someone being a position to perform a BREACH-style sidechannel
attack against DB traffic is sufficiently high that compress-it-all
would be be quite useful in many cases.  Would some sort of
per-table/column marking be useful for some cases?  Probably, but that
doesn't seem to me like it needs to be in v1 of this feature as long
as the protocol layer itself is designed such that parties can
arbitrarily alternate between transmitting compressed and uncompressed
data.  Then if we build such a feature down the road we just add logic
around *when* we compress but the protocol layer doesn't change.

> Well... working out the security minutiae _after_ changing the
> protocol is not historically a winning strategy, I think. Better to do
> it as a vertical stack.

Thinking about it more, I agree that we probably should work out the
protocol level mechanism for resetting compression
context/enabling/disabling/reconfiguring compression as part of this
work.  I don't think that we need to have all the ways that the
application layer might choose to use such things done here, but we
should have all the necessary primitives worked out.

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Tue, May 21, 2024 at 1:39 PM Jacob Champion
 wrote:
>
> If the server doesn't reject compressed packets pre-authentication,
> then case 2 isn't mitigated. (I haven't proven how risky that case is
> yet, to be clear.) In other words: if the threat model is that a
> client can attack us, we shouldn't assume that it will attack us
> politely.

I think I thought I was writing about something else when I wrote that
:sigh:.  I think what I really should have written was a version of
the part below, which is that we use streaming decompression, only
decompress 8kb at a time, and for pre-auth messages limit them to
`PG_MAX_AUTH_TOKEN_LENGTH` (65535 bytes), which isn't really enough
data to actually cause any real-world pain by needing to decompress vs
the equivalent pain of sending invalid uncompressed auth packets.

> > Because we are using streaming decompression, this is much less of an
> > issue than for things that decompress wholesale onto disk/into memory.
>
> (I agree in general, but since you're designing a protocol extension,
> IMO it's not enough that your implementation happens to mitigate
> risks. We more or less have to bake those mitigations into the
> specification of the extension, because things that aren't servers
> have to decompress now. Similar to RFC security considerations.)

We own both the canonical client and server, so those are both covered
here.  I would think it would be the responsibility of any other
system that maintains its own implementation of the postgres protocol
and chooses to support the compression protocol to perform its own
mitigations against potential compression security issues.  Should we
put the fixed message size limits (that have de facto been part of the
protocol since 2021, even if they weren't documented as such) into the
protocol documentation?  That would give implementers of the protocol
numbers that they could actually rely on when implementing the
appropriate safeguards because they would be able to actually have
explicit guarantees about the size of messages. I think it would make
more sense to put the limits on the underlying messages rather than
adding an additional limit that only applies to compressed messages.
( I don't really see how one could implement other tooling that used
pg compression without using streaming compression, as the protocol
never hands over a standalone blob of compressed data: all compressed
data is always part of a stream, but even with streaming decompression
you still need some kind of limits or you will just chew up memory.)

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Champion
On Tue, May 21, 2024 at 9:14 AM Jacob Burroughs
 wrote:
> On Tue, May 21, 2024 at 10:43 AM Jelte Fennema-Nio  wrote:
> > To help get everyone on the same page I wanted to list all the
> > security concerns in one place:
> >
> > 1. Triggering excessive CPU usage before authentication, by asking for
> > very high compression levels
> > 2. Triggering excessive memory/CPU usage before authentication, by
> > sending a client sending a zipbomb
> > 3. Triggering excessive CPU after authentication, by asking for a very
> > high compression level
> > 4. Triggering excessive memory/CPU after authentication due to
> > zipbombs (i.e. small amount of data extracting to lots of data)
> > 5. CRIME style leakage of information about encrypted data
> >
> > 1 & 2 can easily be solved by not allowing any authentication packets
> > to be compressed. This also has benefits for 5.
>
> This is already addressed by only compressing certain message types.
> If we think it is important that the server reject compressed packets
> of other types I can add that, but it seemed reasonable to just make
> the client never send such packets compressed.

If the server doesn't reject compressed packets pre-authentication,
then case 2 isn't mitigated. (I haven't proven how risky that case is
yet, to be clear.) In other words: if the threat model is that a
client can attack us, we shouldn't assume that it will attack us
politely.

> > 4 would require some safety limits on the amount of data that a
> > (small) compressed message can be decompressed to, and stop
> > decompression of that message once that limit is hit. What that limit
> > should be seems hard to choose though. A few ideas:
> > a. The size of the message reported by the uncompressed header. This
> > would mean that at most the 4GB will be uncompressed, since maximum
> > message length is 4GB (limited by 32bit message length field)
> > b. Allow servers to specify maximum client decompressed message length
> > lower than this 4GB, e.g. messages of more than 100MB of uncompressed
> > size should not be allowed.
>
> Because we are using streaming decompression, this is much less of an
> issue than for things that decompress wholesale onto disk/into memory.

(I agree in general, but since you're designing a protocol extension,
IMO it's not enough that your implementation happens to mitigate
risks. We more or less have to bake those mitigations into the
specification of the extension, because things that aren't servers
have to decompress now. Similar to RFC security considerations.)

--Jacob




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Champion
On Tue, May 21, 2024 at 8:23 AM Jacob Burroughs
 wrote:
> As currently implemented, the compression only applies to
> CopyData/DataRow/Query messages, none of which should be involved in
> authentication, unless I've really missed something in my
> understanding.

Right, but Robert has argued that we should compress it all, and I'm
responding to that proposal.

Sorry for introducing threads within threads. But I think it's
valuable to pin down both 1) the desired behavior, and 2) how the
current proposal behaves, as two separate things. I'll try to do a
better job of communicating which I'm talking about.

> > Right, I think it's reasonable to let a sufficiently
> > determined/informed user lift the guardrails, but first we have to
> > choose to put guardrails in place... and then we have to somehow
> > sufficiently inform the users when it's okay to lift them.
>
> My thought would be that compression should be opt-in on the client
> side, with documentation around the potential security pitfalls. (I
> could be convinced it should be opt-in on the server side, but overall
> I think opt-in on the client side generally protects against footguns
> without excessively getting in the way

We absolutely have to document the risks and allow clients to be
written safely. But I think server-side controls on risky behavior
have proven to be generally more valuable, because the server
administrator is often in a better spot to see the overall risks to
the system. ("No, you will not use deprecated ciphersuites. No, you
will not access this URL over plaintext. No, I will not compress this
response containing customer credit card numbers, no matter how nicely
you ask.") There are many more clients than servers, so it's less
risky for the server to enforce safety than to hope that every client
is safe.

Does your database and access pattern regularly mingle secrets with
public data? Would auditing correct client use of compression be a
logistical nightmare? Do your app developers keep indicating in
conversations that they don't understand the risks at all? Cool, just
set `encrypted_compression = nope_nope_nope` on the server and sleep
soundly at night. (Ideally we would default to that.)

> and if an attacker controls the
> client, they can just get the information they want directly-they
> don't need compression sidechannels to get that information.)

Sure, but I don't think that's relevant to the threats being discussed.

> Within SQL-level things, I don't think we can reasonably differentiate
> between private and attacker-controlled information at the
> libpq/server level.

And by the IETF line of argument -- or at least the argument I quoted
above -- that implies that we really have no business introducing
compression when confidentiality is requested. A stronger approach
would require us to prove, or the user to indicate, safety before
compressing.

Take a look at the security notes for QPACK [1] -- keeping in mind
that they know _more_ about what's going on at the protocol level than
we do, due to the header design. And they still say things like "an
encoder might choose not to index values with low entropy" and "these
criteria ... will evolve over time as new attacks are discovered." A
huge amount is left as an exercise for the reader. This stuff is
really hard.

> We can reasonably differentiate between message
> types that *definitely* are private and ones that could have
> either/both data in them, but that's not nearly as useful.  I think
> not compressing auth-related packets plus giving a mechanism to reset
> the compression stream for clients (plus guidance on the tradeoffs
> involved in turning on compression) is about as good as we can get.

The concept of stream reset seems necessary but insufficient at the
application level, which bleeds over into Jelte's compression_restart
proposal. (At the protocol level, I think it may be sufficient?)

If I write a query where one of the WHERE clauses is
attacker-controlled and the other is a secret, I would really like to
not compress that query on the client side. If I join a table of user
IDs against a table of user-provided addresses and a table of
application tokens for that user, compressing even a single row leaks
information about those tokens -- at a _very_ granular level -- and I
would really like the server not to do that.

So if I'm building sand castles... I think maybe it'd be nice to mark
tables (and/or individual columns?) as safe for compression under
encryption, whether by row or in aggregate. And maybe libpq and psql
should be able to turn outgoing compression on and off at will.

And I understand those would balloon the scope of the feature. I'm
worried I'm doing the security-person thing and sucking all the air
out of the room. I know not everybody uses transport encryption; for
those people, compress-it-all is probably a pretty winning strategy,
and there's no need to reset the compression context ever. And the
pg_dump-style, "give me 

Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Tue, May 21, 2024 at 10:43 AM Jelte Fennema-Nio  wrote:
>
> To help get everyone on the same page I wanted to list all the
> security concerns in one place:
>
> 1. Triggering excessive CPU usage before authentication, by asking for
> very high compression levels
> 2. Triggering excessive memory/CPU usage before authentication, by
> sending a client sending a zipbomb
> 3. Triggering excessive CPU after authentication, by asking for a very
> high compression level
> 4. Triggering excessive memory/CPU after authentication due to
> zipbombs (i.e. small amount of data extracting to lots of data)
> 5. CRIME style leakage of information about encrypted data
>
> 1 & 2 can easily be solved by not allowing any authentication packets
> to be compressed. This also has benefits for 5.

This is already addressed by only compressing certain message types.
If we think it is important that the server reject compressed packets
of other types I can add that, but it seemed reasonable to just make
the client never send such packets compressed.

> 3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
> deserves some level of trust. But having knobs to limit impact
> definitely seems useful.
>
> 3 can be solved in two ways afaict:
> a. Allow the server to choose the maximum compression level for each
> compression method (using some GUC), and downgrade the level
> transparently when a higher level is requested
> b. Don't allow the client to choose the compression level that the server 
> uses.
>
> I'd prefer option a

3a would seem preferable given discussion upthread. It would probably
be worth doing some measurement to check how much of an actual
difference in compute effort the max vs the default for all 3
algorithms adds, because I would really prefer to avoid needing to add
even more configuration knobs if the max compression level for the
streaming data usecase is sufficiently performant.

> 4 would require some safety limits on the amount of data that a
> (small) compressed message can be decompressed to, and stop
> decompression of that message once that limit is hit. What that limit
> should be seems hard to choose though. A few ideas:
> a. The size of the message reported by the uncompressed header. This
> would mean that at most the 4GB will be uncompressed, since maximum
> message length is 4GB (limited by 32bit message length field)
> b. Allow servers to specify maximum client decompressed message length
> lower than this 4GB, e.g. messages of more than 100MB of uncompressed
> size should not be allowed.

Because we are using streaming decompression, this is much less of an
issue than for things that decompress wholesale onto disk/into memory.
We only read PQ_RECV_BUFFER_SIZE (8k) bytes off the stream at once,
and when reading a packet we already have a `maxmsglen` that is
PQ_LARGE_MESSAGE_LIMIT (1gb) already, and "We abort the connection (by
returning EOF) if client tries to send more than that.)".  Therefore,
we effectively already have a limit of 1gb that applies to regular
messages too, and I think we should rely on this mechanism for
compressed data too (if we really think we need to make that number
configurable we probably could, but again the fewer new knobs we need
to add the better.


> I think 5 is the most complicated to deal with, especially as it
> depends on the actual usage to know what is safe. I believe we should
> let users have the freedom to make their own security tradeoffs, but
> we should protect them against some of the most glaring issues
> (especially ones that benefit little from compression anyway). As
> already shown by Andrey, sending LDAP passwords in a compressed way
> seems extremely dangerous. So I think we should disallow compressing
> any authentication related packets. To reduce similar risks further we
> can choose to compress only the message types that we expect to
> benefit most from compression. IMHO those are the following (marked
> with (B)ackend or (F)rontend to show who sends them):
> - Query (F)
> - Parse (F)
> - Describe (F)
> - Bind (F)
> - RowDescription (B)
> - DataRow (B)
> - CopyData (B/F)

That seems like a reasonable list (current implementation is just
CopyData/DataRow/Query, but I really just copied that fairly blindly
from the previous incarnation of this effort.) See also my comment
below 1&2 for if we think we need to block decompressing them too.

> Then I think we should let users choose how they want to compress and
> where they want their compression stream to restart. Something like
> this:
> a. compression_restart=query: Restart the stream after every query.
> Recommended if queries across the same connection are triggered by
> different end-users. I think this would be a sane default
> b. compression_restart=message: Restart the stream for every message.
> Recommended if the amount of correlation between rows of the same
> query is a security concern.
> c. compression_restart=manual: Don't restart the stream automatically,
> but 

Re: libpq compression (part 3)

2024-05-21 Thread Jelte Fennema-Nio
On Mon, 20 May 2024 at 21:42, Jacob Champion
 wrote:
> As Andrey points out, there was prior work done that started to take
> this into account. I haven't reviewed it to see how good it is -- and
> I think there are probably many use cases in which queries and tables
> contain both private and attacker-controlled information -- but if we
> agree that they have to be separated, then the strategy can at least
> be improved upon.


To help get everyone on the same page I wanted to list all the
security concerns in one place:

1. Triggering excessive CPU usage before authentication, by asking for
very high compression levels
2. Triggering excessive memory/CPU usage before authentication, by
sending a client sending a zipbomb
3. Triggering excessive CPU after authentication, by asking for a very
high compression level
4. Triggering excessive memory/CPU after authentication due to
zipbombs (i.e. small amount of data extracting to lots of data)
5. CRIME style leakage of information about encrypted data

1 & 2 can easily be solved by not allowing any authentication packets
to be compressed. This also has benefits for 5.

3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
deserves some level of trust. But having knobs to limit impact
definitely seems useful.

3 can be solved in two ways afaict:
a. Allow the server to choose the maximum compression level for each
compression method (using some GUC), and downgrade the level
transparently when a higher level is requested
b. Don't allow the client to choose the compression level that the server uses.

I'd prefer option a

4 would require some safety limits on the amount of data that a
(small) compressed message can be decompressed to, and stop
decompression of that message once that limit is hit. What that limit
should be seems hard to choose though. A few ideas:
a. The size of the message reported by the uncompressed header. This
would mean that at most the 4GB will be uncompressed, since maximum
message length is 4GB (limited by 32bit message length field)
b. Allow servers to specify maximum client decompressed message length
lower than this 4GB, e.g. messages of more than 100MB of uncompressed
size should not be allowed.

I think 5 is the most complicated to deal with, especially as it
depends on the actual usage to know what is safe. I believe we should
let users have the freedom to make their own security tradeoffs, but
we should protect them against some of the most glaring issues
(especially ones that benefit little from compression anyway). As
already shown by Andrey, sending LDAP passwords in a compressed way
seems extremely dangerous. So I think we should disallow compressing
any authentication related packets. To reduce similar risks further we
can choose to compress only the message types that we expect to
benefit most from compression. IMHO those are the following (marked
with (B)ackend or (F)rontend to show who sends them):
- Query (F)
- Parse (F)
- Describe (F)
- Bind (F)
- RowDescription (B)
- DataRow (B)
- CopyData (B/F)

Then I think we should let users choose how they want to compress and
where they want their compression stream to restart. Something like
this:
a. compression_restart=query: Restart the stream after every query.
Recommended if queries across the same connection are triggered by
different end-users. I think this would be a sane default
b. compression_restart=message: Restart the stream for every message.
Recommended if the amount of correlation between rows of the same
query is a security concern.
c. compression_restart=manual: Don't restart the stream automatically,
but only when the client user calls a specific function. Recommended
only if the user can make trade-offs, or if no encryption is used
anyway.




Re: libpq compression (part 3)

2024-05-21 Thread Jacob Burroughs
On Mon, May 20, 2024 at 2:42 PM Jacob Champion
 wrote:
>
> I mean... you said it, not me. I'm trying not to rain on the parade
> too much, because compression is clearly very valuable. But it makes
> me really uncomfortable that we're reintroducing the compression
> oracle (especially over the authentication exchange, which is
> generally more secret than the rest of the traffic).

As currently implemented, the compression only applies to
CopyData/DataRow/Query messages, none of which should be involved in
authentication, unless I've really missed something in my
understanding.

> Right, I think it's reasonable to let a sufficiently
> determined/informed user lift the guardrails, but first we have to
> choose to put guardrails in place... and then we have to somehow
> sufficiently inform the users when it's okay to lift them.

My thought would be that compression should be opt-in on the client
side, with documentation around the potential security pitfalls. (I
could be convinced it should be opt-in on the server side, but overall
I think opt-in on the client side generally protects against footguns
without excessively getting in the way and if an attacker controls the
client, they can just get the information they want directly-they
don't need compression sidechannels to get that information.)

> But for SQL, where's the dividing line between attacker-chosen and
> attacker-sought? To me, it seems like only the user knows; the server
> has no clue. I think that puts us "lower" in Alyssa's model than HTTP
> is.
>
> As Andrey points out, there was prior work done that started to take
> this into account. I haven't reviewed it to see how good it is -- and
> I think there are probably many use cases in which queries and tables
> contain both private and attacker-controlled information -- but if we
> agree that they have to be separated, then the strategy can at least
> be improved upon.

Within SQL-level things, I don't think we can reasonably differentiate
between private and attacker-controlled information at the
libpq/server level.  We can reasonably differentiate between message
types that *definitely* are private and ones that could have
either/both data in them, but that's not nearly as useful.  I think
not compressing auth-related packets plus giving a mechanism to reset
the compression stream for clients (plus guidance on the tradeoffs
involved in turning on compression) is about as good as we can get.
That said, I *think* the feature is reasonable to be
reviewed/committed without the reset functionality as long as the
compressed data already has the mechanism built in (as it does) to
signal when a decompressor should restart its streaming.  The actual
signaling protocol mechanism/necessary libpq API can happen in
followon work.


-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-21 Thread Robert Haas
On Mon, May 20, 2024 at 4:12 PM Magnus Hagander  wrote:
> That used to be the case in HTTP/1. But header compression was one of the 
> headline features of HTTP/2, which isn't exactly new anymore. But there's a 
> special algorithm, HPACK, for it. And then http/3 uses QPACK. Cloudflare has 
> a pretty decent blog post explaining why and how: 
> https://blog.cloudflare.com/hpack-the-silent-killer-feature-of-http-2/, or 
> rfc7541 for all the details.
>
> tl;dr; is yes, let's be careful not to expose headers to a CRIME-style 
> attack. And I doubt our connections has as much to gain by compressing 
> "header style" fields as http, so we are probably better off just not 
> compressing >  Work: https://www.redpill-linpro.com/

What do you think constitutes a header in the context of the
PostgreSQL wire protocol?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-20 Thread Magnus Hagander
On Mon, May 20, 2024 at 9:09 PM Andrey M. Borodin 
wrote:

>
>
>
> > On 20 May 2024, at 23:37, Robert Haas  wrote:
> >
> > But, does this mean that we should just refuse to offer compression as
> > a feature?
>
> No, absolutely, we need the feature.
>
> > I guess I don't understand why TLS removed
> > support for encryption entirely instead of disclaiming its use in some
> > appropriate way.
>
> I think, the scope of TLS is too broad. HTTPS in turn has a compression.
> But AFAIK it never compress headers.
> IMO we should try to avoid compressing authentication information.
>

That used to be the case in HTTP/1. But header compression was one of the
headline features of HTTP/2, which isn't exactly new anymore. But there's a
special algorithm, HPACK, for it. And then http/3 uses QPACK.
Cloudflare has a pretty decent blog post explaining why and how:
https://blog.cloudflare.com/hpack-the-silent-killer-feature-of-http-2/, or
rfc7541 for all the details.

tl;dr; is yes, let's be careful not to expose headers to a CRIME-style
attack. And I doubt our connections has as much to gain by compressing
"header style" fields as http, so we are probably better off just not
compressing those parts.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: libpq compression (part 3)

2024-05-20 Thread Jacob Champion
On Mon, May 20, 2024 at 11:37 AM Robert Haas  wrote:
> But if that's a practical
> attack, preventing compression prior to the authentication exchange
> probably isn't good enough

I mean... you said it, not me. I'm trying not to rain on the parade
too much, because compression is clearly very valuable. But it makes
me really uncomfortable that we're reintroducing the compression
oracle (especially over the authentication exchange, which is
generally more secret than the rest of the traffic).

> But, does this mean that we should just refuse to offer compression as
> a feature? This kind of attack isn't a threat in every environment,
> and in some environments, compression could be pretty useful. For
> instance, you might need to pull down a lot of data from the database
> over a slow connection. Perhaps you're the only user of the database,
> and you wrote all of the queries yourself in a locked vault, accepting
> no untrusted inputs. In that case, these kinds of attacks aren't
> possible, or at least I don't see how, but you might want both
> compression and encryption.

Right, I think it's reasonable to let a sufficiently
determined/informed user lift the guardrails, but first we have to
choose to put guardrails in place... and then we have to somehow
sufficiently inform the users when it's okay to lift them.

> I guess I don't understand why TLS removed
> support for encryption entirely instead of disclaiming its use in some
> appropriate way.

One of the IETF conversations was at [1] (there were dissenters on the
list, as you might expect). My favorite summary is this one from
Alyssa Rowan:

> Compression is usually best performed as "high" as possible; transport layer 
> is blind to what's being compressed, which is (as we now know) was definitely 
> too low and was in retrospect a mistake.
>
> Any application layer protocol needs to know - if compression is supported - 
> to separate compression contexts for attacker-chosen plaintext and 
> attacker-sought unknown secrets. (As others have stated, HTTPbis covers this.)

But for SQL, where's the dividing line between attacker-chosen and
attacker-sought? To me, it seems like only the user knows; the server
has no clue. I think that puts us "lower" in Alyssa's model than HTTP
is.

As Andrey points out, there was prior work done that started to take
this into account. I haven't reviewed it to see how good it is -- and
I think there are probably many use cases in which queries and tables
contain both private and attacker-controlled information -- but if we
agree that they have to be separated, then the strategy can at least
be improved upon.

--Jacob

[1] https://mailarchive.ietf.org/arch/msg/tls/xhMLf8j4pq8W_ZGXUUU1G_m6r1c/




Re: libpq compression (part 3)

2024-05-20 Thread Jacob Champion
On Mon, May 20, 2024 at 11:05 AM Andrey M. Borodin  wrote:
> > So, the data would be compressed first, with framing around that, and
> > then transport encryption would happen afterwards. I don't see how
> > that would leak your password, but I have a feeling that might be a
> > sign that I'm about to learn some unpleasant truths.
>
> Compression defeats encryption. That's why it's not in TLS anymore.
> The thing is compression codecs use data self correlation. And if you mix 
> secret data with user's data, user might guess how correlated they are.

I'm slow on the draw, but I hacked up a sample client to generate
traffic against the compression-enabled server, to try to illustrate.

If my client sends an LDAP password of "hello", followed by the query
`SELECT 'world'`, as part of the same gzip stream, I get two encrypted
packets on the wire: lengths 42 and 49 bytes. If the client instead
sends the query `SELECT 'hello'`, I get lengths 42 and 46. We lost
three bytes, and there's only been one packet on the stream before the
query; if the observer controlled the query, it's pretty obvious that
the self-similarity has to have come from the PasswordMessage. Rinse
and repeat.

That doesn't cover the case where the password itself is low-entropy,
either. "hellohellohellohello" at least has length, but once you
compress it that collapses. So an attacker can passively monitor for
shorter password packets and know which user to target first.

--Jacob




Re: libpq compression (part 3)

2024-05-20 Thread Andrey M. Borodin




> On 20 May 2024, at 23:37, Robert Haas  wrote:
> 
>  But if that's a practical
> attack, preventing compression prior to the authentication exchange
> probably isn't good enough: the user could also try to guess what
> queries are being sent on behalf of other users through the same
> pooled connection, or they could try to use the bits of the query that
> they can control to guess what the other bits of the query that they
> can't see look like.

All these attacks can be practically exploited in a controlled environment.
That's why previous incarnation of this patchset [0] contained a way to reset 
compression context. And Odyssey AFAIR did it (Dan, coauthor of that patch, 
implemented the compression in Odyssey).
But attacking authentication is much more straightforward and viable.

> On 20 May 2024, at 23:37, Robert Haas  wrote:
> 
> But, does this mean that we should just refuse to offer compression as
> a feature?

No, absolutely, we need the feature.

> I guess I don't understand why TLS removed
> support for encryption entirely instead of disclaiming its use in some
> appropriate way.

I think, the scope of TLS is too broad. HTTPS in turn has a compression. But 
AFAIK it never compress headers.
IMO we should try to avoid compressing authentication information.


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/38/3499/



Re: libpq compression (part 3)

2024-05-20 Thread Robert Haas
On Mon, May 20, 2024 at 2:05 PM Andrey M. Borodin  wrote:
> Compression defeats encryption. That's why it's not in TLS anymore.
> The thing is compression codecs use data self correlation. And if you mix 
> secret data with user's data, user might guess how correlated they are.

Yeah, I'm aware that there are some problems like this. For example,
suppose the bad guy can both supply some of the data sent over the
connection (e.g. by typing search queries into a web page) and also
observe the traffic between the web application and the database. Then
they could supply data and try to guess how correlated that is with
other data sent over the same connection. But if that's a practical
attack, preventing compression prior to the authentication exchange
probably isn't good enough: the user could also try to guess what
queries are being sent on behalf of other users through the same
pooled connection, or they could try to use the bits of the query that
they can control to guess what the other bits of the query that they
can't see look like.

But, does this mean that we should just refuse to offer compression as
a feature? This kind of attack isn't a threat in every environment,
and in some environments, compression could be pretty useful. For
instance, you might need to pull down a lot of data from the database
over a slow connection. Perhaps you're the only user of the database,
and you wrote all of the queries yourself in a locked vault, accepting
no untrusted inputs. In that case, these kinds of attacks aren't
possible, or at least I don't see how, but you might want both
compression and encryption. I guess I don't understand why TLS removed
support for encryption entirely instead of disclaiming its use in some
appropriate way.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-20 Thread Andrey M. Borodin



> On 20 May 2024, at 22:48, Robert Haas  wrote:
> 
> On Mon, May 20, 2024 at 1:23 PM Jacob Champion
>  wrote:
>> On Mon, May 20, 2024 at 10:01 AM Robert Haas  wrote:
>>> I really hope that you can't poke big enough holes to kill the feature
>>> entirely, though. Because that sounds sad.
>> 
>> Even if there are holes, I don't think the situation's going to be bad
>> enough to tank everything; otherwise no one would be able to use
>> decompression on the Internet. :D And I expect the authors of the
>> newer compression methods to have thought about these things [1].
>> 
>> I hesitate to ask as part of the same email, but what were the plans
>> for compression in combination with transport encryption? (Especially
>> if you plan to compress the authentication exchange, since mixing your
>> LDAP password into the compression context seems like it might be a
>> bad idea if you don't want to leak it.)
> 
> So, the data would be compressed first, with framing around that, and
> then transport encryption would happen afterwards. I don't see how
> that would leak your password, but I have a feeling that might be a
> sign that I'm about to learn some unpleasant truths.

Compression defeats encryption. That's why it's not in TLS anymore.
The thing is compression codecs use data self correlation. And if you mix 
secret data with user's data, user might guess how correlated they are.


Best regards, Andrey Borodin.



Re: libpq compression (part 3)

2024-05-20 Thread Robert Haas
On Mon, May 20, 2024 at 1:23 PM Jacob Champion
 wrote:
> On Mon, May 20, 2024 at 10:01 AM Robert Haas  wrote:
> > I really hope that you can't poke big enough holes to kill the feature
> > entirely, though. Because that sounds sad.
>
> Even if there are holes, I don't think the situation's going to be bad
> enough to tank everything; otherwise no one would be able to use
> decompression on the Internet. :D And I expect the authors of the
> newer compression methods to have thought about these things [1].
>
> I hesitate to ask as part of the same email, but what were the plans
> for compression in combination with transport encryption? (Especially
> if you plan to compress the authentication exchange, since mixing your
> LDAP password into the compression context seems like it might be a
> bad idea if you don't want to leak it.)

So, the data would be compressed first, with framing around that, and
then transport encryption would happen afterwards. I don't see how
that would leak your password, but I have a feeling that might be a
sign that I'm about to learn some unpleasant truths.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-20 Thread Jacob Champion
On Mon, May 20, 2024 at 10:01 AM Robert Haas  wrote:
> I really hope that you can't poke big enough holes to kill the feature
> entirely, though. Because that sounds sad.

Even if there are holes, I don't think the situation's going to be bad
enough to tank everything; otherwise no one would be able to use
decompression on the Internet. :D And I expect the authors of the
newer compression methods to have thought about these things [1].

I hesitate to ask as part of the same email, but what were the plans
for compression in combination with transport encryption? (Especially
if you plan to compress the authentication exchange, since mixing your
LDAP password into the compression context seems like it might be a
bad idea if you don't want to leak it.)

--Jacob

[1] https://datatracker.ietf.org/doc/html/rfc8878#name-security-considerations




Re: libpq compression (part 3)

2024-05-20 Thread Robert Haas
On Mon, May 20, 2024 at 12:49 PM Jacob Champion
 wrote:
> ...and my response was that, no, the proposal doesn't seem to be
> requiring that authentication take place before compression is done.
> (As evidenced by your email. :D) If the claim is that there are no
> security problems with letting unauthenticated clients force
> decompression, then I can try to poke holes in that;

I would prefer this approach, so I suggest trying to poke holes here
first. If you find big enough holes then...

> or if the claim
> is that we don't need to worry about that at all because we'll wait
> until after authentication, then I can poke holes in that too. My
> request is just that we choose one.

...we can fall back to this and you can try to poke holes here.

I really hope that you can't poke big enough holes to kill the feature
entirely, though. Because that sounds sad.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-20 Thread Jacob Champion
On Mon, May 20, 2024 at 8:29 AM Robert Haas  wrote:
> It does occur to me that if some compression algorithm has a buffer
> overrun bug, restricting its use until after authentication might
> reduce the score of the resulting CVE, because now you have to be able
> to authenticate to make an exploit work. Perhaps that's an argument
> for imposing a restriction here, but it doesn't seem to be the
> argument that you're making.

It wasn't my argument; Jacob B said above:

> in general postgres servers are usually connected to by
> clients that the server admin has some channel to talk to (after all
> they somehow had to get access to log in to the server in the first
> place) if they are doing something wasteful, given that a client can
> do a lot worse things than enable aggressive compression by writing
> bad queries.

...and my response was that, no, the proposal doesn't seem to be
requiring that authentication take place before compression is done.
(As evidenced by your email. :D) If the claim is that there are no
security problems with letting unauthenticated clients force
decompression, then I can try to poke holes in that; or if the claim
is that we don't need to worry about that at all because we'll wait
until after authentication, then I can poke holes in that too. My
request is just that we choose one.

> But if the client says in the startup message that it would like to
> send and receive compressed data and the server is happy with that
> request, I don't see why we need to postpone implementing that request
> until after the authentication exchange is completed. I think that
> will make the code more complicated and I don't see a security
> benefit.

I haven't implemented compression bombs before to know lots of
details, but I think the general idea is to take up resources that are
vastly disproportionate to the effort expended by the client. The
systemic risk is then more or less multiplied by the number of
intermediaries that need to do the decompression. Maybe all three of
our algorithms are hardened against malicious compression techniques;
that'd be great. But if we've never had a situation where a completely
untrusted peer can hand a blob to the server and say "here, decompress
this for me", maybe we'd better check?

--Jacob




Re: libpq compression (part 3)

2024-05-20 Thread Robert Haas
On Mon, May 20, 2024 at 10:15 AM Jacob Champion
 wrote:
> This is just restating the "you can already do bad things" argument. I
> understand that if your query gets executed, it's going to consume
> resources on the thing that's executing it (for the record, though,
> there are people working on constraining that). But introducing
> disproportionate resource consumption into all traffic-inspecting
> software, like pools and bouncers, seems like a different thing to me.
> Many use cases are going to be fine with it, of course, but I don't
> think it should be hand-waved.

I can't follow this argument.

I think it's important that the startup message is always sent
uncompressed, because it's a strange exception to our usual
message-formatting rules, and because it's so security-critical. I
don't think we should do anything to allow more variation there,
because any benefit will be small and the chances of introducing
security vulnerabilities seems non-trivial.

But if the client says in the startup message that it would like to
send and receive compressed data and the server is happy with that
request, I don't see why we need to postpone implementing that request
until after the authentication exchange is completed. I think that
will make the code more complicated and I don't see a security
benefit. If the use of a particular compression algorithm is going to
impose too much load, the server, or the pooler, is free to refuse it,
and should. Deferring the use of the compression method until after
authentication doesn't really solve any problem here, at least not
that I can see.

It does occur to me that if some compression algorithm has a buffer
overrun bug, restricting its use until after authentication might
reduce the score of the resulting CVE, because now you have to be able
to authenticate to make an exploit work. Perhaps that's an argument
for imposing a restriction here, but it doesn't seem to be the
argument that you're making.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-20 Thread Robert Haas
On Sat, May 18, 2024 at 1:18 AM Jacob Burroughs
 wrote:
> I like this more than what I proposed, and will update the patches to
> reflect this proposal. (I've gotten them locally back into a state of
> applying cleanly and dealing with the changes needed to support direct
> SSL connections, so refactoring the protocol layer shouldn't be too
> hard now.)

Sounds good!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-20 Thread Jacob Champion
On Fri, May 17, 2024 at 4:03 PM Jelte Fennema-Nio  wrote:
>
> On Fri, 17 May 2024 at 23:10, Jacob Champion
>  wrote:
> > We're talking about a transport-level option, though -- I thought the
> > proposal enabled compression before authentication completed? Or has
> > that changed?
>
> I think it would make sense to only compress messages after
> authentication has completed. The gain of compressing authentication
> related packets seems pretty limited.

Okay. But if we're relying on that for its security properties, it
needs to be enforced by the server.

> > (I'm suspicious of arguments that begin "well you can already do bad
> > things"
>
> Once logged in it's really easy to max out a core of the backend
> you're connected as. There's many trivial queries you can use to do
> that. An example would be:
> SELECT sum(i) from generate_series(1, 10) i;

This is just restating the "you can already do bad things" argument. I
understand that if your query gets executed, it's going to consume
resources on the thing that's executing it (for the record, though,
there are people working on constraining that). But introducing
disproportionate resource consumption into all traffic-inspecting
software, like pools and bouncers, seems like a different thing to me.
Many use cases are going to be fine with it, of course, but I don't
think it should be hand-waved.

--Jacob




Re: libpq compression (part 3)

2024-05-17 Thread Jacob Burroughs
On Fri, May 17, 2024 at 11:40 AM Robert Haas  wrote:
>
> To be clear, I am not arguing that it should be the receiver's choice.
> I'm arguing it should be the client's choice, which means the client
> decides what it sends and also tells the server what to send to it.
> I'm open to counter-arguments, but as I've thought about this more,
> I've come to the conclusion that letting the client control the
> behavior is the most likely to be useful and the most consistent with
> existing facilities. I think we're on the same page based on the rest
> of your email: I'm just clarifying.

This is what I am imagining too

> I have some quibbles with the syntax but I agree with the concept.
> What I'd probably do is separate the server side thing into two GUCs,
> each with a list of algorithms, comma-separated, like we do for other
> lists in postgresql.conf. Maybe make the default 'all' meaning
> "everything this build of the server supports". On the client side,
> I'd allow both things to be specified using a single option, because
> wanting to do the same thing in both directions will be common, and
> you actually have to type in connection strings sometimes, so
> verbosity matters more.
>
> As far as the format of the value for that keyword, what do you think
> about either compression=DO_THIS_BOTH_WAYS or
> compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do
> this" being a specification of the same form already accepted for
> server-side compression e.g. gzip or gzip:level=9? If you don't like
> that, why do you think the proposal you made above is better, and why
> is that one now punctuated with slashes instead of semicolons?

I like this more than what I proposed, and will update the patches to
reflect this proposal. (I've gotten them locally back into a state of
applying cleanly and dealing with the changes needed to support direct
SSL connections, so refactoring the protocol layer shouldn't be too
hard now.)

On Fri, May 17, 2024 at 11:10 AM Jacob Champion
 wrote:
> We're talking about a transport-level option, though -- I thought the
> proposal enabled compression before authentication completed? Or has
> that changed?
On Fri, May 17, 2024 at 1:03 PM Jelte Fennema-Nio  wrote:
> I think it would make sense to only compress messages after
> authentication has completed. The gain of compressing authentication
> related packets seems pretty limited.

At the protocol level, compressed data is a message type that can be
used to wrap arbitrary data as soon as the startup packet is
processed.  However, as an implementation detail that clients should
not rely on but that we can rely on in thinking about the
implications, the only message types that are compressed (except in
the 0005 CI patch for test running only) are PqMsg_CopyData,
PqMsg_DataRow, and PqMsg_Query, all of which aren't sent before
authentication.

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-17 Thread Jelte Fennema-Nio
On Fri, 17 May 2024 at 23:10, Jacob Champion
 wrote:
> We're talking about a transport-level option, though -- I thought the
> proposal enabled compression before authentication completed? Or has
> that changed?

I think it would make sense to only compress messages after
authentication has completed. The gain of compressing authentication
related packets seems pretty limited.

> (I'm suspicious of arguments that begin "well you can already do bad
> things"

Once logged in it's really easy to max out a core of the backend
you're connected as. There's many trivial queries you can use to do
that. An example would be:
SELECT sum(i) from generate_series(1, 10) i;

So I don't think it makes sense to worry about an attacker using a
high compression level as a means to DoS the server. Sending a few of
the above queries seems much easier.




Re: libpq compression (part 3)

2024-05-17 Thread Jelte Fennema-Nio
On Fri, 17 May 2024 at 23:40, Robert Haas  wrote:
> To be clear, I am not arguing that it should be the receiver's choice.
> I'm arguing it should be the client's choice, which means the client
> decides what it sends and also tells the server what to send to it.
> I'm open to counter-arguments, but as I've thought about this more,
> I've come to the conclusion that letting the client control the
> behavior is the most likely to be useful and the most consistent with
> existing facilities. I think we're on the same page based on the rest
> of your email: I'm just clarifying.

+1

> I have some quibbles with the syntax but I agree with the concept.
> What I'd probably do is separate the server side thing into two GUCs,
> each with a list of algorithms, comma-separated, like we do for other
> lists in postgresql.conf. Maybe make the default 'all' meaning
> "everything this build of the server supports". On the client side,
> I'd allow both things to be specified using a single option, because
> wanting to do the same thing in both directions will be common, and
> you actually have to type in connection strings sometimes, so
> verbosity matters more.
>
> As far as the format of the value for that keyword, what do you think
> about either compression=DO_THIS_BOTH_WAYS or
> compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do
> this" being a specification of the same form already accepted for
> server-side compression e.g. gzip or gzip:level=9? If you don't like
> that, why do you think the proposal you made above is better, and why
> is that one now punctuated with slashes instead of semicolons?

+1




Re: libpq compression (part 3)

2024-05-17 Thread Robert Haas
On Fri, May 17, 2024 at 4:53 PM Jacob Burroughs
 wrote:
> I think I was more thinking that trying to let both parties control
> the parameter seemed like a recipe for confusion and sadness, and so
> the choice that felt most natural to me was to let the sender control
> it, but I'm definitely open to changing that the other way around.

To be clear, I am not arguing that it should be the receiver's choice.
I'm arguing it should be the client's choice, which means the client
decides what it sends and also tells the server what to send to it.
I'm open to counter-arguments, but as I've thought about this more,
I've come to the conclusion that letting the client control the
behavior is the most likely to be useful and the most consistent with
existing facilities. I think we're on the same page based on the rest
of your email: I'm just clarifying.

> On the server side, we use slash separated sets of options
> connection_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS/client_to_server=OVERRIDE_FOR_THIS_DIRECTION/server_to_client=OVERRIDE_FOR_THIS_DIRECTION
> with the values being semicolon separated compression algorithms.
> On the client side, you can specify
> compression=,
> but on the client side you can actually specify compression options,
> which the server will use if provided, and otherwise it will fall back
> to defaults.

I have some quibbles with the syntax but I agree with the concept.
What I'd probably do is separate the server side thing into two GUCs,
each with a list of algorithms, comma-separated, like we do for other
lists in postgresql.conf. Maybe make the default 'all' meaning
"everything this build of the server supports". On the client side,
I'd allow both things to be specified using a single option, because
wanting to do the same thing in both directions will be common, and
you actually have to type in connection strings sometimes, so
verbosity matters more.

As far as the format of the value for that keyword, what do you think
about either compression=DO_THIS_BOTH_WAYS or
compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do
this" being a specification of the same form already accepted for
server-side compression e.g. gzip or gzip:level=9? If you don't like
that, why do you think the proposal you made above is better, and why
is that one now punctuated with slashes instead of semicolons?

> If we think we need to, we could let the server specify defaults for
> server-side compression.  My overall thought though is that having an
> excessive number of knobs increases the surface area for testing and
> bugs while also increasing potential user confusion and that allowing
> configuration on *both* sides doesn't seem sufficiently useful to be
> worth adding that complexity.

I agree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-17 Thread Jacob Champion
On Fri, May 17, 2024 at 1:53 PM Jacob Burroughs
 wrote:
> New proposal, predicated on the assumption that if you enable
> compression you are ok with the client picking whatever level they
> want.  At least with the currently enabled algorithms I don't think
> any of them are so insane that they would knock over a server or
> anything, and in general postgres servers are usually connected to by
> clients that the server admin has some channel to talk to (after all
> they somehow had to get access to log in to the server in the first
> place) if they are doing something wasteful, given that a client can
> do a lot worse things than enable aggressive compression by writing
> bad queries.

We're talking about a transport-level option, though -- I thought the
proposal enabled compression before authentication completed? Or has
that changed?

(I'm suspicious of arguments that begin "well you can already do bad
things", anyway... It seems like there's a meaningful difference
between consuming resources running a parsed query and consuming
resources trying to figure out what the parsed query is. I don't know
if the solution is locking in a compression level, or something else;
maybe they're both reasonably mitigated in the same way. I haven't
really looked into zip bombs much.)

--Jacob




Re: libpq compression (part 3)

2024-05-17 Thread Jacob Burroughs
On Thu, May 16, 2024 at 3:28 AM Robert Haas  wrote:
>
> Well, I mean, I don't really know what the right answer is here, but
> right now I can say pg_dump --compress=gzip to compress the dump with
> gzip, or pg_dump --compress=gzip:9 to compress with gzip level 9. Now,
> say that instead of compressing the output, I want to compress the
> data sent to me over the connection. So I figure I should be able to
> say pg_dump 'compress=gzip' or pg_dump 'compress=gzip:9'. I think you
> want to let me do the first of those but not the second. But, to turn
> your question on its head, what would be the reasoning behind such a
> restriction?

I think I was more thinking that trying to let both parties control
the parameter seemed like a recipe for confusion and sadness, and so
the choice that felt most natural to me was to let the sender control
it, but I'm definitely open to changing that the other way around.

> Note also the precedent of pg_basebackup. I can say pg_basebackup
> --compress=server-gzip:9 to ask the server to compress the backup with
> gzip at level 9. In that case, what I request from the server changes
> the actual output that I get, which is not the case here. Even so, I
> don't really understand what the justification would be for refusing
> to let the client ask for a specific compression level.
>
> And on the flip side, I also don't understand why the server would
> want to mandate a certain compression level. If compression is very
> expensive for a certain algorithm when the level is above some
> threshold X, we could have a GUC to limit the maximum level that the
> client can request. But, given that the gzip compression level
> defaults to 6 in every other context, why would the administrator of a
> particular server want to say, well, the default for my server is 3 or
> 9 or whatever?
>
> (This is of course all presuming you want to use gzip at all, which
> you probably don't, because gzip is crazy slow. Use lz4 or zstd! But
> it makes the point.)

New proposal, predicated on the assumption that if you enable
compression you are ok with the client picking whatever level they
want.  At least with the currently enabled algorithms I don't think
any of them are so insane that they would knock over a server or
anything, and in general postgres servers are usually connected to by
clients that the server admin has some channel to talk to (after all
they somehow had to get access to log in to the server in the first
place) if they are doing something wasteful, given that a client can
do a lot worse things than enable aggressive compression by writing
bad queries.

On the server side, we use slash separated sets of options
connection_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS/client_to_server=OVERRIDE_FOR_THIS_DIRECTION/server_to_client=OVERRIDE_FOR_THIS_DIRECTION
with the values being semicolon separated compression algorithms.
On the client side, you can specify
compression=,
but on the client side you can actually specify compression options,
which the server will use if provided, and otherwise it will fall back
to defaults.

If we think we need to, we could let the server specify defaults for
server-side compression.  My overall thought though is that having an
excessive number of knobs increases the surface area for testing and
bugs while also increasing potential user confusion and that allowing
configuration on *both* sides doesn't seem sufficiently useful to be
worth adding that complexity.

-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-16 Thread Robert Haas
On Wed, May 15, 2024 at 12:50 PM Jacob Burroughs
 wrote:
> I think I would agree with that.  That said, I don't think the client
> should be in the business of specifying what configuration of the
> compression algorithm the server should use.  The server administrator
> (or really most of the time, the compression library developer's
> defaults) gets to pick the compression/compute tradeoff for
> compression that runs on the server (which I would imagine would be
> the vast majority of it), and the client gets to pick those same
> parameters for any compression that runs on the client machine
> (probably indeed in practice only for large COPYs).  The *algorithm*
> needs to actually be communicated/negotiated since different
> client/server pairs may be built with support for different
> compression libraries, but I think it is reasonable to say that the
> side that actually has to do the computationally expensive part owns
> the configuration of that part too.  Maybe I'm missing a good reason
> that we want to allow clients to choose compression levels for the
> server though?

Well, I mean, I don't really know what the right answer is here, but
right now I can say pg_dump --compress=gzip to compress the dump with
gzip, or pg_dump --compress=gzip:9 to compress with gzip level 9. Now,
say that instead of compressing the output, I want to compress the
data sent to me over the connection. So I figure I should be able to
say pg_dump 'compress=gzip' or pg_dump 'compress=gzip:9'. I think you
want to let me do the first of those but not the second. But, to turn
your question on its head, what would be the reasoning behind such a
restriction?

Note also the precedent of pg_basebackup. I can say pg_basebackup
--compress=server-gzip:9 to ask the server to compress the backup with
gzip at level 9. In that case, what I request from the server changes
the actual output that I get, which is not the case here. Even so, I
don't really understand what the justification would be for refusing
to let the client ask for a specific compression level.

And on the flip side, I also don't understand why the server would
want to mandate a certain compression level. If compression is very
expensive for a certain algorithm when the level is above some
threshold X, we could have a GUC to limit the maximum level that the
client can request. But, given that the gzip compression level
defaults to 6 in every other context, why would the administrator of a
particular server want to say, well, the default for my server is 3 or
9 or whatever?

(This is of course all presuming you want to use gzip at all, which
you probably don't, because gzip is crazy slow. Use lz4 or zstd! But
it makes the point.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-15 Thread Jacob Burroughs
On Wed, May 15, 2024 at 11:31 AM Robert Haas  wrote:
> From my point of view, it's the client who knows what it wants to do
> with the connection. If the client plans to read a lot of data, it
> might want the server to compress that data, especially if it knows
> that it's on a slow link. If the client plans to send a lot of data --
> basically COPY, I'm not thinking this is going to matter much
> otherwise -- then it might want to compress that data before sending
> it, again, especially if it knows that it's on a slow link.
>
> But what does the server know, really? If some client connects and
> sends a SELECT query, the server can't guess whether that query is
> going to return 1 row or 100 million rows, so it has no idea of
> whether compression is likely to make sense or not. It is entitled to
> decide, as a matter of policy, that it's not willing to perform
> compression, either because of CPU consumption or security concerns or
> whatever, but it has no knowledge of what the purpose of this
> particular connection is.

I think I would agree with that.  That said, I don't think the client
should be in the business of specifying what configuration of the
compression algorithm the server should use.  The server administrator
(or really most of the time, the compression library developer's
defaults) gets to pick the compression/compute tradeoff for
compression that runs on the server (which I would imagine would be
the vast majority of it), and the client gets to pick those same
parameters for any compression that runs on the client machine
(probably indeed in practice only for large COPYs).  The *algorithm*
needs to actually be communicated/negotiated since different
client/server pairs may be built with support for different
compression libraries, but I think it is reasonable to say that the
side that actually has to do the computationally expensive part owns
the configuration of that part too.  Maybe I'm missing a good reason
that we want to allow clients to choose compression levels for the
server though?


-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-15 Thread Robert Haas
On Wed, May 15, 2024 at 12:24 PM Jacob Burroughs
 wrote:
> > But now I'm wondering whether these options should really be symmetric
> > on the client and server sides? Isn't it for the server just to
> > specify a list of acceptable algorithms, and the client to set the
> > compression options? If both sides are trying to set the compression
> > level, for example, who wins?
>
> Compression options really only ever apply to the side doing the
> compressing, and at least as I had imagined things each party
> (client/server) only used its own level/other compression params.
> That leaves me thinking, maybe we really want two independent GUCs,
> one for "what algorithms are enabled/negotiable" and one for "how
> should I configure my compressors" and then we reduce the dimensions
> we are trying to shove into one GUC and each one ends up with a very
> clear purpose:
> connection_compression=(yes|no|alg1,alg2:server_to_client=alg1,alg2:client_to_server=alg3)
> connection_compression_opts=gzip:level=2

>From my point of view, it's the client who knows what it wants to do
with the connection. If the client plans to read a lot of data, it
might want the server to compress that data, especially if it knows
that it's on a slow link. If the client plans to send a lot of data --
basically COPY, I'm not thinking this is going to matter much
otherwise -- then it might want to compress that data before sending
it, again, especially if it knows that it's on a slow link.

But what does the server know, really? If some client connects and
sends a SELECT query, the server can't guess whether that query is
going to return 1 row or 100 million rows, so it has no idea of
whether compression is likely to make sense or not. It is entitled to
decide, as a matter of policy, that it's not willing to perform
compression, either because of CPU consumption or security concerns or
whatever, but it has no knowledge of what the purpose of this
particular connection is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-15 Thread Jacob Burroughs
On Wed, May 15, 2024 at 8:38 AM Robert Haas  wrote:
>
> I agree with that goal, but I'm somewhat confused by how your proposal
> achieves it. You had
> libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off,
> so how do we parse that? Is that two completely separate
> specifications, one for lzma and one for gzip, and each of those has
> one option which is set to off? And then they are separated from each
> other by a semicolon? That actually does make sense, and I think it
> may do a better job allowing for compression options than my proposal,
> but it also seems a bit weird, because client_to_server and
> server_to_client are not really compression options at all. They're
> framing when this compression specification applies, rather than what
> it does when it applies. In a way it's a bit like the fact that you
> can prefix a pg_basebackup's --compress option with client- or server-
> to specify where the compression should happen. But we can't quite
> reuse that idea here, because in that case there's no question of
> doing it in both places, whereas here, you might want one thing for
> upstream and another thing for downstream.

Your interpretation is correct, but I don't disagree that it ends up
feeling confusing.

> I'm not a fan of three settings; I could go with two settings, one for
> each direction, and if you want both you have to set both. Or, another
> idea, what if we just separated the two directions with a slash,
> SEND/RECEIVE, and if there's no slash, then it applies to both
> directions. So you could say
> connection_compression='gzip:level=9/lzma' or whatever.
>
> But now I'm wondering whether these options should really be symmetric
> on the client and server sides? Isn't it for the server just to
> specify a list of acceptable algorithms, and the client to set the
> compression options? If both sides are trying to set the compression
> level, for example, who wins?

Compression options really only ever apply to the side doing the
compressing, and at least as I had imagined things each party
(client/server) only used its own level/other compression params.
That leaves me thinking, maybe we really want two independent GUCs,
one for "what algorithms are enabled/negotiable" and one for "how
should I configure my compressors" and then we reduce the dimensions
we are trying to shove into one GUC and each one ends up with a very
clear purpose:
connection_compression=(yes|no|alg1,alg2:server_to_client=alg1,alg2:client_to_server=alg3)
connection_compression_opts=gzip:level=2


-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-15 Thread Robert Haas
On Tue, May 14, 2024 at 5:21 PM Jacob Burroughs
 wrote:
> The reason for both the semicolons and for not doing this is related
> to using the same specification structure as here:
> https://www.postgresql.org/docs/current/app-pgbasebackup.html
> (specifically the --compress argument).

I agree with that goal, but I'm somewhat confused by how your proposal
achieves it. You had
libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off,
so how do we parse that? Is that two completely separate
specifications, one for lzma and one for gzip, and each of those has
one option which is set to off? And then they are separated from each
other by a semicolon? That actually does make sense, and I think it
may do a better job allowing for compression options than my proposal,
but it also seems a bit weird, because client_to_server and
server_to_client are not really compression options at all. They're
framing when this compression specification applies, rather than what
it does when it applies. In a way it's a bit like the fact that you
can prefix a pg_basebackup's --compress option with client- or server-
to specify where the compression should happen. But we can't quite
reuse that idea here, because in that case there's no question of
doing it in both places, whereas here, you might want one thing for
upstream and another thing for downstream.

> Alternatively, we could have `connection_compression`,
> `connection_compression_server_to_client`, and
> `connection_compression_client_to_server` as three separate GUCs (and
> on the client side `compression`, `compression_server_to_client`, and
> `compression_client_to_server` as three separate connection
> parameters), where we would treat `connection_compression` as a
> default that could be overridden by an explicit
> client_to_server/server_to_client.  That creates the slightly funky
> case where if you specify all three then the base one ends up unused
> because the two more specific ones are being used instead, but that
> isn't necessarily terrible.  On the server side we *could* go with
> just the server_to_client and client_to_server ones, but I think we
> want it to be easy to use this feature in the simple case with a
> single libpq parameter.

I'm not a fan of three settings; I could go with two settings, one for
each direction, and if you want both you have to set both. Or, another
idea, what if we just separated the two directions with a slash,
SEND/RECEIVE, and if there's no slash, then it applies to both
directions. So you could say
connection_compression='gzip:level=9/lzma' or whatever.

But now I'm wondering whether these options should really be symmetric
on the client and server sides? Isn't it for the server just to
specify a list of acceptable algorithms, and the client to set the
compression options? If both sides are trying to set the compression
level, for example, who wins?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Fwd: libpq compression (part 3)

2024-05-14 Thread Jacob Burroughs
On Tue, May 14, 2024 at 3:24 PM Robert Haas  wrote:
>
> IMHO, that's a HUGE improvement. But:
>
> * I would probably change is the name "libpq_compression", because
> even though we have src/backend/libpq, we typically use libpq to refer
> to the client library, not the server's implementation of the wire
> protocol. I think we could call it connection_encryption or
> wire_protocol_encryption or something like that, but I'm not a huge
> fan of libpq_compression.
>
I think connection_compression would seem like a good name to me.

> * I would use commas, not semicolons, to separate items in a list,
> i.e. lzma,gzip not lzma;gzip. I think that convention is nearly
> universal in PostgreSQL, but feel free to point out counterexamples if
> you were modelling this on something.
>
> * libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off
> reads strangely to me. How about making it so that the syntax is like
> this:
>
> libpq_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS:client_to_server=OVERRIDE_FOR_THIS_DIRECTION:servert_to_client=OVERRIDE_FOR_THIS_DIRECTION
>
> With all components being optional. So this example could be written
> in any of these ways:
>
> libpq_compression=lzma;server_to_client=gzip
> libpq_compression=gzip;client_to_server=lzma
> libpq_compression=server_to_client=gzip;client_to_server=lzma
> libpq_compression=client_to_server=lzma;client_to_server=gzip
>
> And if I wrote libpq_compression=server_to_client=gzip that would mean
> send data to the client using gzip and in the other direction use
> whatever the default is.

The reason for both the semicolons and for not doing this is related
to using the same specification structure as here:
https://www.postgresql.org/docs/current/app-pgbasebackup.html
(specifically the --compress argument). Reusing that specification
requires that we use commas to separate the flags for a compression
method, and therefore left me with semicolons as the leftover
separator character. I think we could go with something like your
proposal, and in a lot of ways I like it, but there's still the
possibility of e.g.
`libpq_compression=client_to_server=zstd:level=10,long=true,gzip;client_to_server=gzip`
and I'm not quite sure how to make the separator characters work
coherently if we need to treat `zstd:level=10,long=true` as a unit.
Alternatively, we could have `connection_compression`,
`connection_compression_server_to_client`, and
`connection_compression_client_to_server` as three separate GUCs (and
on the client side `compression`, `compression_server_to_client`, and
`compression_client_to_server` as three separate connection
parameters), where we would treat `connection_compression` as a
default that could be overridden by an explicit
client_to_server/server_to_client.  That creates the slightly funky
case where if you specify all three then the base one ends up unused
because the two more specific ones are being used instead, but that
isn't necessarily terrible.  On the server side we *could* go with
just the server_to_client and client_to_server ones, but I think we
want it to be easy to use this feature in the simple case with a
single libpq parameter.

--
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-14 Thread Robert Haas
On Tue, May 14, 2024 at 3:22 PM Jacob Burroughs
 wrote:
> What if we went with:
> Server side:
> * `libpq_compression=on` (I just want everything the server supports
> available; probably the most common case)
> * `libpq_compression=off` (I don't want any compression ever with this server)
> * `libpq_compression=lzma;gzip` (I only want these algorithms for
> whatever reason)
> * `libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off`
> (I only want to send data with lzma and receive data with gzip)
> Client side:
> *`compression=on` (I just want compression; pick sane defaults
> automatically for me; probably the most common case)
> * `compression=off` (I don't want any compression)
> * `compression=lzma;gzip` (I only want these algorithms for whatever reason)
> * `compression=lzma:client_to_server=off;gzip:server_to_client=off` (I
> only want to receive data with lzma and send data with gzip)
>
> `client_to_server`/`server_to_client` is a bit verbose, but it's very
> explicit in what it means, so you don't need to reason about who is
> sending/receiving/etc in a given context, and a given config string
> applied to the server or the client side has the same effect on the
> connection.

IMHO, that's a HUGE improvement. But:

* I would probably change is the name "libpq_compression", because
even though we have src/backend/libpq, we typically use libpq to refer
to the client library, not the server's implementation of the wire
protocol. I think we could call it connection_encryption or
wire_protocol_encryption or something like that, but I'm not a huge
fan of libpq_compression.

* I would use commas, not semicolons, to separate items in a list,
i.e. lzma,gzip not lzma;gzip. I think that convention is nearly
universal in PostgreSQL, but feel free to point out counterexamples if
you were modelling this on something.

* libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off
reads strangely to me. How about making it so that the syntax is like
this:

libpq_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS:client_to_server=OVERRIDE_FOR_THIS_DIRECTION:servert_to_client=OVERRIDE_FOR_THIS_DIRECTION

With all components being optional. So this example could be written
in any of these ways:

libpq_compression=lzma;server_to_client=gzip
libpq_compression=gzip;client_to_server=lzma
libpq_compression=server_to_client=gzip;client_to_server=lzma
libpq_compression=client_to_server=lzma;client_to_server=gzip

And if I wrote libpq_compression=server_to_client=gzip that would mean
send data to the client using gzip and in the other direction use
whatever the default is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-14 Thread Jacob Burroughs
On Tue, May 14, 2024 at 1:35 PM Robert Haas  wrote:
>
> Well, in my last response before the thread died, I complained that
> libpq_compression=gzip:compress=off was confusing, and I stand by
> that, because "compress" is used both in the name of the parameter and
> as an option within the value of that parameter. I think there's more
> than one acceptable way to resolve that problem, but I think leaving
> it like that is unacceptable.

What if we went with:
Server side:
* `libpq_compression=on` (I just want everything the server supports
available; probably the most common case)
* `libpq_compression=off` (I don't want any compression ever with this server)
* `libpq_compression=lzma;gzip` (I only want these algorithms for
whatever reason)
* `libpq_compression=lzma:client_to_server=off;gzip:server_to_client=off`
(I only want to send data with lzma and receive data with gzip)
Client side:
*`compression=on` (I just want compression; pick sane defaults
automatically for me; probably the most common case)
* `compression=off` (I don't want any compression)
* `compression=lzma;gzip` (I only want these algorithms for whatever reason)
* `compression=lzma:client_to_server=off;gzip:server_to_client=off` (I
only want to receive data with lzma and send data with gzip)

`client_to_server`/`server_to_client` is a bit verbose, but it's very
explicit in what it means, so you don't need to reason about who is
sending/receiving/etc in a given context, and a given config string
applied to the server or the client side has the same effect on the
connection.

> Even more broadly, I think there have been a couple of versions of
> this patch now where I read the documentation and couldn't understand
> how the feature was supposed to work, and I'm not really willing to
> spend time trying to review a complex patch for conformity with a
> design that I can't understand in the first place. I don't want to
> pretend like I'm the smartest person on this mailing list, and in fact
> I know that I'm definitely not, but I think I'm smart enough and
> experienced enough with PostgreSQL that if I look at the description
> of a parameter and say "I don't understand how the heck this is
> supposed to work", probably a lot of users are going to have the same
> reaction. That lack of understanding on my part my come either from
> the explanation of the parameter not being as good as it needs to be,
> or from the design itself not being as good as it needs to be, or from
> some combination of the two, but whichever is the case, IMHO you or
> somebody else has got to figure out how to fix it.

If the above proposal seems better to you I'll both rework the patch
and then also try to rewrite the relevant bits of documentation to
separate out "what knobs are there" and "how do I specify the flags to
turn the knobs", because I think those two being integrated is making
the parameter documentation less readable/followable.


-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-14 Thread Robert Haas
On Tue, May 14, 2024 at 12:30 PM Jacob Burroughs
 wrote:
> I've withdrawn this patch from the commitfest.  I had been waiting for
> some resolution on "Add new protocol message to change GUCs for usage
> with future protocol-only GUCs" before I rebased/refactored this one,
> because this would be introducing the first protocol extension so far,
> and that discussion appeared to be working out some meaningful issues
> on how GUCs and protocol parameters should interact.  If you think it
> is worthwhile to proceed here though, I am very happy to do so. (I
> would love to see this feature actually make it into postgres; it
> would almost certainly be a big efficiency and cost savings win for
> how my company deploys postgres internally :) )

I don't think you should wait for that to be resolved; IMHO, this
patch needs to inform that discussion more than the other way around.

> > 2. I still think it needs to be more clear how the interface is
> > supposed to work. I do not want to spend time reviewing a patch to see
> > whether it works without understanding how it is intended to work --
> > and I also think that reviewing the patch in detail before we've got
> > the user interface right makes a whole lot of sense.
>
> Regarding the interface, what I had originally gone for was the idea
> that the naming of the options was from the perspective of the side
> you were setting them on.  Therefore, when setting `libpq_compression`
> as a server-level GUC, `compress` would control if the server would
> compress (send compressed data) with the given algorithm, and
> `decompress` would control if the the server would decompress (receive
> compressed data) with the given algorithm.  And likewise on the client
> side, when setting `compression` as a connection config option,
> `compress` would control if the *client* would compress (send
> compressed data) with the given algorithm, and `decompress` would
> control if the the *client* would decompress (receive compressed data)
> with the given algorithm.  So for a client to pick what to send, it
> would choose from the intersection of its own `compress=true` and the
> server's `decompress=true` algorithms sent in the `ParameterStatus`
> message with `libpq_compression`.  And likewise on the server side, it
> would choose from the intersection of the server's `compress=true`
> algorithms and the client's `decompress=true` algorithms sent in the
> `_pq_.libpq_compression` startup option.  If you have a better
> suggestion I am very open to it though.

Well, in my last response before the thread died, I complained that
libpq_compression=gzip:compress=off was confusing, and I stand by
that, because "compress" is used both in the name of the parameter and
as an option within the value of that parameter. I think there's more
than one acceptable way to resolve that problem, but I think leaving
it like that is unacceptable.

Even more broadly, I think there have been a couple of versions of
this patch now where I read the documentation and couldn't understand
how the feature was supposed to work, and I'm not really willing to
spend time trying to review a complex patch for conformity with a
design that I can't understand in the first place. I don't want to
pretend like I'm the smartest person on this mailing list, and in fact
I know that I'm definitely not, but I think I'm smart enough and
experienced enough with PostgreSQL that if I look at the description
of a parameter and say "I don't understand how the heck this is
supposed to work", probably a lot of users are going to have the same
reaction. That lack of understanding on my part my come either from
the explanation of the parameter not being as good as it needs to be,
or from the design itself not being as good as it needs to be, or from
some combination of the two, but whichever is the case, IMHO you or
somebody else has got to figure out how to fix it.

I do also admit that there is a possibility that everything is totally
fine and I've just been kinda dumb on the days when I've looked at the
patch. If a chorus of other hackers shows up and gives me a few whacks
with the cluestick and after that I look at the proposed options and
go "oh, yeah, these totally make sense, I was just being stupid," fair
enough! But right now that's not where I'm at. I don't want you to
explain to me how it works; I want you to change it in some way so
that when I or some end user looks at it, they go "I don't need an
explanation of how that works because it's extremely clear to me
already," or at least "hmm, this is a bit complicated but after a
quick glance at the documentation it makes sense".

I would really like to see this patch go forward, but IMHO these UI
questions are blockers.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-05-14 Thread Jacob Burroughs
On Tue, May 14, 2024 at 11:08 AM Robert Haas  wrote:
>
> According to https://commitfest.postgresql.org/48/4746/ this patch set
> needs review, but:
>
> 1. Considering that there have been no updates for 5 months, maybe
> it's actually dead?

I've withdrawn this patch from the commitfest.  I had been waiting for
some resolution on "Add new protocol message to change GUCs for usage
with future protocol-only GUCs" before I rebased/refactored this one,
because this would be introducing the first protocol extension so far,
and that discussion appeared to be working out some meaningful issues
on how GUCs and protocol parameters should interact.  If you think it
is worthwhile to proceed here though, I am very happy to do so. (I
would love to see this feature actually make it into postgres; it
would almost certainly be a big efficiency and cost savings win for
how my company deploys postgres internally :) )

> 2. I still think it needs to be more clear how the interface is
> supposed to work. I do not want to spend time reviewing a patch to see
> whether it works without understanding how it is intended to work --
> and I also think that reviewing the patch in detail before we've got
> the user interface right makes a whole lot of sense.

Regarding the interface, what I had originally gone for was the idea
that the naming of the options was from the perspective of the side
you were setting them on.  Therefore, when setting `libpq_compression`
as a server-level GUC, `compress` would control if the server would
compress (send compressed data) with the given algorithm, and
`decompress` would control if the the server would decompress (receive
compressed data) with the given algorithm.  And likewise on the client
side, when setting `compression` as a connection config option,
`compress` would control if the *client* would compress (send
compressed data) with the given algorithm, and `decompress` would
control if the the *client* would decompress (receive compressed data)
with the given algorithm.  So for a client to pick what to send, it
would choose from the intersection of its own `compress=true` and the
server's `decompress=true` algorithms sent in the `ParameterStatus`
message with `libpq_compression`.  And likewise on the server side, it
would choose from the intersection of the server's `compress=true`
algorithms and the client's `decompress=true` algorithms sent in the
`_pq_.libpq_compression` startup option.  If you have a better
suggestion I am very open to it though.



-- 
Jacob Burroughs | Staff Software Engineer
E: jburrou...@instructure.com




Re: libpq compression (part 3)

2024-05-14 Thread Robert Haas
On Fri, Jan 12, 2024 at 4:11 PM Robert Haas  wrote:
> I think that would definitely be better than "compress" and
> "decompress," but I was worried that it might be unclear to the user
> whether the parameter that they specified was from the point of view
> of the client or the server. Perhaps that's a dumb thing to worry
> about, though.

According to https://commitfest.postgresql.org/48/4746/ this patch set
needs review, but:

1. Considering that there have been no updates for 5 months, maybe
it's actually dead?

and

2. I still think it needs to be more clear how the interface is
supposed to work. I do not want to spend time reviewing a patch to see
whether it works without understanding how it is intended to work --
and I also think that reviewing the patch in detail before we've got
the user interface right makes a whole lot of sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-01-12 Thread Robert Haas
On Fri, Jan 12, 2024 at 4:02 PM Jacob Burroughs
 wrote:
> > I wonder if we could use "upstream" and "downstream" to be clearer? Or
> > some other terminology?
>
> What about `send` and `receive`?

I think that would definitely be better than "compress" and
"decompress," but I was worried that it might be unclear to the user
whether the parameter that they specified was from the point of view
of the client or the server. Perhaps that's a dumb thing to worry
about, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2024-01-12 Thread Jacob Burroughs
> I wonder if we could use "upstream" and "downstream" to be clearer? Or
> some other terminology?

What about `send` and `receive`?




Re: libpq compression (part 3)

2024-01-12 Thread Robert Haas
On Sun, Dec 31, 2023 at 2:32 AM Jacob Burroughs
 wrote:
> I ended up reworking this to use a version of this option in place of
> the `none` hackery, but naming the parameters `compress` and
> `decompress, so to disable compression but allow decompression you
> would specify `libpq_compression=gzip:compress=off`.

I'm still a bit befuddled by this interface.
libpq_compression=gzip:compress=off looks a lot like it's saying "do
it, except don't". I guess that you're using "compress" and
"decompress" to distinguish the two directions - i.e. server to client
and client to server - but of course in both directions the sender
compresses and the receiver decompresses, so I don't find that very
clear.

I wonder if we could use "upstream" and "downstream" to be clearer? Or
some other terminology?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2023-12-29 Thread Jelte Fennema-Nio
On Fri, 29 Dec 2023 at 11:02, Andrey M. Borodin  wrote:
> This patchset allows sending CompressionMethod message, which allows to set 
> another codec\level picked from the set of negotiated codec sets (during 
> startup).

Did you mean to attach a patchset? I don't see the CompressionMethod
message in the v2 patchset. Only a CompressedData one.




Re: libpq compression (part 3)

2023-12-29 Thread Andrey M. Borodin



> On 21 Dec 2023, at 05:30, Jelte Fennema-Nio  wrote:
> 
> One thing I'm wondering: should it be possible for the client to change the 
> compression it wants mid-connection?

This patchset allows sending CompressionMethod message, which allows to set 
another codec\level picked from the set of negotiated codec sets (during 
startup).


Best regards, Andrey Borodin.



Re: libpq compression (part 3)

2023-12-20 Thread Jelte Fennema-Nio
Thanks for working on this!

One thing I'm wondering: should it be possible for the client to change the
compression it wants mid-connection? I can think of some scenarios where
that would be useful to connection poolers: if a pooler does plain
forwarding of the compressed messages, then it would need to be able to
disable/enable compression if it wants to multiplex client connections with
different compression settings over the same server connection.


Re: libpq compression (part 3)

2023-12-20 Thread Jacob Burroughs
> I'm having difficulty understanding the details of this handshaking
> algorithm from this description. It seems good that the handshake
> proceeds in each direction somewhat separately from the other, but I
> don't quite understand how the whole thing fits together. If the
> client tells the server that 'none,gzip' is supported, and I elect to
> start using gzip, how does the client know that I picked gzip rather
> than none? Are the compressed packets self-identifying?

I agree I could have spelled this out more clearly.  I forgot to
mention that I added a byte to the CompressedMessage message type that
specifies the chosen algorithm.  So if the server receives
'none,gzip', it can either keep sending uncompressed regular messages,
or it can compress them in CompressedMessage packets which now look
like "z{len}{format}{data}" (format just being a member of the
pg_compress_algorithm enum, so `1` in the case of gzip).  Overall the
intention is that both the client and the server can just start
sending CompressedMessages once they receive the list of ones other
party supports without any negotiation or agreement needed and without
an extra message type to first specify the compression algorithm. (One
byte per message seemed to me like a reasonable overhead for the
simplicity, but it wouldn't be hard to bring back SetCompressionMethod
if we prefer.)

> It's also slightly odd to me that the same parameter seems to specify
> both what we want to send, and what we're able to receive. I'm not
> really sure we should have separate parameters for those things, but I
> don't quite understand how this works without it. The "none" thing
> seems like a bit of a hack. It lets you say "I'd like to receive
> compressed data but send uncompressed data" ... but what about the
> reverse? How do you say "don't bother compressing what you receive
> from the server, but please lz4 everything you send to the server"? Or
> how do you say "use zstd from server to client, but lz4 from client to
> server"? It seems like you can't really say that kind of thing.

When I came up with the protocol I was imagining that basically both
server admins and clients might want a decent bit more control over
the compression they do rather than the decompression they do, since
compression is generally much more computationally expensive than
decompression.  Now that you point it out though, I don't think that
actually makes that much sense.

> What if we had, on the server side, a GUC saying what compression to
> accept and a GUC saying what compression to be willing to do? And then
> let the client request whatever it wants for each direction.

Here's two proposals:
Option 1:
GUCs:
libpq_compression (default "off")
libpq_decompression (default "auto", which is defined to be equal to
libpq_compression)
Connection parameters:
compression (default "off")
decompression (default "auto", which is defined to be equal to compression)

I think we would only send the decompression fields over the wire to
the other side, to be used to filter for the first chosen compression
field.  We would send the `_pq_.libpq_decompression` protocol
extension even if only compression was enabled and not decompression
so that the server knows to enable compression processing for the
connection (I think this would be the only place we would still use
`none`, and not as part of a list in this case.)  I think we also
would want to add libpq functions to allow a client to check the
last-used compression algorithm in each direction for any
monitoring/inspection purposes (actually that's probably a good idea
regardless, so a client application that cares doesn't need to/try to
implement the intersection and assumption around choosing the first
algorithm in common).  Also I'm open to better names than "auto", I
just would like it to avoid unnecessary verbosity for the common case
of "I just want to enable bidirectional compression with whatever
algorithms are available with default parameters".

Option 2:
This one is even cleaner in the common case but a bit worse in the
uncommon case: just use one parameter and have
compression/decompression enabling be part of the compression detail
(e.g. "libpq_compression='gzip:no_decompress;lz4:level=2,no_compress;zstd'"
or something like that, in which case the "none,gzip" case would
become "'libpq_compression=gzip:no_compress'").  See
https://www.postgresql.org/docs/current/app-pgbasebackup.html ,
specifically the `--compress` flag, for how specifying compression
algorithms and details works.

I'm actually not sure which of the two I prefer; opinions are welcome :)

Thanks,
Jacob




Re: libpq compression (part 3)

2023-12-20 Thread Robert Haas
On Tue, Dec 19, 2023 at 11:41 AM Jacob Burroughs
 wrote:
> The compression "handshaking" process now looks as follows:
> 1. Client sends startup packet with `_pq_.libpq_compression = alg1;alg2`
> 2. At this point, the server can immediately begin compressing packets
> to the client with any of the specified algorithms it supports if it
> so chooses
> 3. Server includes `libpq_compression` in the automatically sent
> `ParameterStatus` messages before handshaking
> 4. At this point, the client can immediately begin compressing packets
> to the server with any of the supported algorithms
> Both the server and client will prefer to compress using the first
> algorithm in their list that the other side supports, and we
> explicitly support `none` in the algorithm list.  This allows e.g. a
> client to use `none;gzip` and a server to use `zstd;gzip;lz4`, and
> then the client will not compress its data but the server will send
> its data using gzip.

I'm having difficulty understanding the details of this handshaking
algorithm from this description. It seems good that the handshake
proceeds in each direction somewhat separately from the other, but I
don't quite understand how the whole thing fits together. If the
client tells the server that 'none,gzip' is supported, and I elect to
start using gzip, how does the client know that I picked gzip rather
than none? Are the compressed packets self-identifying?

It's also slightly odd to me that the same parameter seems to specify
both what we want to send, and what we're able to receive. I'm not
really sure we should have separate parameters for those things, but I
don't quite understand how this works without it. The "none" thing
seems like a bit of a hack. It lets you say "I'd like to receive
compressed data but send uncompressed data" ... but what about the
reverse? How do you say "don't bother compressing what you receive
from the server, but please lz4 everything you send to the server"? Or
how do you say "use zstd from server to client, but lz4 from client to
server"? It seems like you can't really say that kind of thing.

What if we had, on the server side, a GUC saying what compression to
accept and a GUC saying what compression to be willing to do? And then
let the client request whatever it wants for each direction.

> Also please let me know if I have made any notable mailing list/patch
> etiquette/format/structure errors.  This is my first time submitting a
> patch to a mailing-list driven open source project and while I have
> tried to carefully review the various wiki guides I'm sure I didn't
> take everything in perfectly.

Seems fine to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq compression (part 3)

2023-12-19 Thread Jacob Burroughs
> I believe this patch series is ready for detailed review/testing, with one 
> caveat: as can be seen here https://cirrus-ci.com/build/6732518292979712 , 
> the build is passing on all platforms and all tests except for the primary 
> SSL test on Windows.

One correction: I apparently missed a kerberos timeout failure on
freebsd with compression enabled (being color blind the checkmark and
still running colors are awfully similar, and I misread what I saw).
I haven't yet successfully reproduced that one, so I may or may not
need some pointers to sort it out, but I think whatever it is the fix
will be small enough that the patch overall is still reviewable.




Re: libpq compression (part 2)

2023-08-11 Thread Andrey M. Borodin



> On 10 Aug 2023, at 19:47, Jonah H. Harris  wrote:
> 
> Pinging to see if anyone has continued to work on this behind-the-scenes or 
> whether this is the latest patch set there is.

It's still on my TODO list, but I haven't done much review cycles yet. And the 
patch series already needs heavy rebase.

Thank for your interest in the topic.


Best regards, Andrey Borodin.



Re: libpq compression (part 2)

2023-08-10 Thread Jonah H. Harris
Pinging to see if anyone has continued to work on this behind-the-scenes or
whether this is the latest patch set there is.

-- 
Jonah H. Harris


Re: libpq compression (part 2)

2022-11-18 Thread Peter Eisentraut

On 18.11.22 02:07, Andrey Borodin wrote:

2. This literal
{no_compression_name}
should be replaced by explicit form
{no_compression_name, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL}


That doesn't seem better.




Re: libpq compression (part 2)

2022-11-17 Thread Andrey Borodin
On Thu, Nov 17, 2022 at 7:09 AM Peter Eisentraut
 wrote:
>
> Note that the above code was just changed in dce92e59b1.
Thanks!

> I don't know
> how that affects this patch set.
With dce92e59b1 it would be much easier to find a bug in the compression patch.

Some more notes about the patch. (sorry for posting review notes in so
many different messages)

1. zs_is_valid_impl_id(unsigned int id)
{
return id >= 0 && id < lengthof(zs_algorithms);
}

id is unsigned, no need to check it's non-negative.

2. This literal
{no_compression_name}
should be replaced by explicit form
{no_compression_name, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL}

3. Comments like "Return true if should, false if should not." are useless.

Thanks!

Best regards, Andrey Borodin.




Re: libpq compression (part 2)

2022-11-17 Thread Peter Eisentraut

On 17.11.22 01:27, Andrey Borodin wrote:

Also I've found one more TODO item for the patch. Currently in
fe-connect.c patch is doing buffer reset:
conn->inStart = conn->inCursor = conn->inEnd = 0;
This effectively consumes butes up tu current cursor. However, packet
inspection is continued. The patch works because in most cases
following code will issue re-read of message. Coincidentally.
/* Get the type of request. */
if (pqGetInt((int *) , 4, conn))
{
/* We'll come back when there are more data */
return PGRES_POLLING_READING;
}


Note that the above code was just changed in dce92e59b1.  I don't know 
how that affects this patch set.






Re: libpq compression (part 2)

2022-11-16 Thread Andrey Borodin
On Tue, Nov 15, 2022 at 7:17 PM Justin Pryzby  wrote:
>

Also I've found one more TODO item for the patch. Currently in
fe-connect.c patch is doing buffer reset:
conn->inStart = conn->inCursor = conn->inEnd = 0;
This effectively consumes butes up tu current cursor. However, packet
inspection is continued. The patch works because in most cases
following code will issue re-read of message. Coincidentally.
/* Get the type of request. */
if (pqGetInt((int *) , 4, conn))
{
/* We'll come back when there are more data */
return PGRES_POLLING_READING;
}

But I think we need a proper
goto keep_going;
to start from the beginning of the message.


Thank you!

Best regards, Andrey Borodin.




Re: libpq compression (part 2)

2022-11-15 Thread Justin Pryzby
On Mon, Nov 14, 2022 at 07:44:24PM -0800, Andrey Borodin wrote:
> patchset needs a heavy rebase. If no one shows up to fix it I'll do

Despite what its git timestamp says, this is based on the most recent
patch from January, which I've had floating around since then.  It
needed to be rebased over at least:

  - guc_tables patch;
  - build and test with meson;
  - doc/

Some of my changes are separate so you can see what I've done.
check_libpq_compression() is in the wrong place, but I couldn't
immediately see where else to put it, since src/common can't include the
backend's guc headers.

Some of the makefile changes seem unnecessary (now?), and my meson
changes don't seem quite right, either.

There's no reason for Zstd to be a separate patch anymore.

It should be updated to parse compression level and options using the
infrastructure introduced for basebackup.

And address the architectural issue from 2 years ago:
https://www.postgresql.org/message-id/20220118043919.GA23027%40telsasoft.com

The global variable PqStream should be moved into some libpq structure
(Port?) and handled within secure_read().  And pqsecure_read shouldn't
be passed as a function pointer/callback.  And verify it passes tests
with all supported compression algorithms and connects to old servers.

-- 
Justin
>From 6a74b2af0131499a3afa5132bc6dac08eb1884f6 Mon Sep 17 00:00:00 2001
From: usernamedt 
Date: Fri, 14 Jan 2022 01:38:55 +0500
Subject: [PATCH 1/4] Implement libpq compression

---
 doc/src/sgml/config.sgml  |   21 +
 doc/src/sgml/libpq.sgml   |   37 +
 doc/src/sgml/protocol.sgml|  158 +++
 src/Makefile.global.in|1 +
 src/backend/Makefile  |8 +
 src/backend/catalog/system_views.sql  |9 +
 src/backend/libpq/pqcomm.c|  277 -
 src/backend/postmaster/postmaster.c   |   10 +
 .../libpqwalreceiver/libpqwalreceiver.c   |   13 +-
 src/backend/utils/activity/backend_status.c   |   29 +
 src/backend/utils/adt/pgstatfuncs.c   |   50 +-
 src/backend/utils/misc/guc_funcs.c|   21 +
 src/backend/utils/misc/guc_tables.c   |   10 +
 src/bin/pgbench/pgbench.c |   17 +-
 src/bin/psql/command.c|   17 +
 src/common/Makefile   |4 +-
 src/common/z_stream.c |  663 +++
 src/common/zpq_stream.c   | 1022 +
 src/include/catalog/pg_proc.dat   |   18 +-
 src/include/common/z_stream.h |  109 ++
 src/include/common/zpq_stream.h   |  120 ++
 src/include/libpq/libpq-be.h  |3 +
 src/include/libpq/libpq.h |1 +
 src/include/libpq/pqcomm.h|3 +
 src/include/utils/backend_status.h|7 +
 src/interfaces/libpq/Makefile |   14 +
 src/interfaces/libpq/exports.txt  |2 +
 src/interfaces/libpq/fe-connect.c |  129 ++-
 src/interfaces/libpq/fe-exec.c|   10 +-
 src/interfaces/libpq/fe-misc.c|   92 +-
 src/interfaces/libpq/fe-protocol3.c   |   71 +-
 src/interfaces/libpq/libpq-fe.h   |4 +
 src/interfaces/libpq/libpq-int.h  |   14 +
 src/test/regress/expected/rules.out   |   14 +-
 src/tools/msvc/Mkvcbuild.pm   |2 +-
 35 files changed, 2884 insertions(+), 96 deletions(-)
 create mode 100644 src/common/z_stream.c
 create mode 100644 src/common/zpq_stream.c
 create mode 100644 src/include/common/z_stream.h
 create mode 100644 src/include/common/zpq_stream.h

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bd50ea8e480..de4d9532cc5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1045,6 +1045,27 @@ include_dir 'conf.d'
   
  
 
+ 
+  libpq_compression (string)
+  
+   libpq_compression configuration parameter
+  
+  
+  
+   
+This parameter controls the available client-server traffic compression methods.
+It allows rejecting compression requests even if it is supported by the server (for example, due to security, or CPU consumption).
+The default is on, which means that all supported compression methods are allowed.
+For more precise control, a list of the allowed compression methods can be specified.
+For example, to allow only lz4 and zlib, set the setting to lz4,zlib.
+Also, maximal allowed compression level can be specified for each method, e.g. lz4:1,zlib:2 setting will set the
+maximal compression level for lz4 to 1 and zlib to 2.
+If a client requests the compression with a higher compression level, it will be set to the maximal allowed one.
+Default (and recommended) maximal compression level for each algorith

Re: libpq compression (part 2)

2022-11-14 Thread Andrey Borodin
On Sat, Nov 12, 2022 at 8:04 PM Andrey Borodin  wrote:
>
> While testing patch some more I observe unpleasant segfaults:
>
> #27 0x0b08fda2 in lz4_decompress (d_stream=0x18cf82a0,
> src=0x7feae4fa505d, src_size=92,
> (gdb) select-frame 27
> (gdb) info locals
> ds = 0x18cf82a0
> decPtr = 0x18cf8aec ""
> decBytes = -87
>

I've debugged the stuff and the problem resulted to be in wrong
message limits for Lz4 compression
+#define MESSAGE_MAX_BYTES 819200
instead of
+#define MESSAGE_MAX_BYTES 8192

Other codecs can utilize continuation of the decompression stream
using ZS_DATA_PENDING, while Lz4 cannot do this. I was going to
produce quickfix for all my lz4 findings, but it occured to me that a
patchset needs a heavy rebase. If no one shows up to fix it I'll do
that in one of the coming weekends. Either way here's a reproducer for
the coredump:
psql 'compression=lz4' -f boom.sql

Thanks!

Best regards, Andrey Borodin.


boom.sql
Description: Binary data


Re: libpq compression (part 2)

2022-11-12 Thread Andrey Borodin
On Sat, Nov 12, 2022 at 1:47 PM Andrey Borodin  wrote:
>
> I've tried the patch, it works as advertised.

While testing patch some more I observe unpleasant segfaults:

#26 0x7fecafa1e058 in __memcpy_ssse3_back () from target:/lib64/libc.so.6
#27 0x0b08fda2 in lz4_decompress (d_stream=0x18cf82a0,
src=0x7feae4fa505d, src_size=92,
src_processed=0x79f4fdf8, dst=0x18b01f80, dst_size=8192,
dst_processed=0x79f4fe60)
#28 0x0b090624 in zs_read (zs=0x18cdfbf0, src=0x7feae4fa505d,
src_size=92, src_processed=0x79f4fdf8,
dst=0x18b01f80, dst_size=8192, dst_processed=0x79f4fe60)
#29 0x0b08eb8f in zpq_read_compressed_message
(zpq=0x7feae4fa5010, dst=0x18b01f80 "Q", dst_len=8192,
dst_processed=0x79f4fe60)
#30 0x0b08f1a9 in zpq_read (zpq=0x7feae4fa5010,
dst=0x18b01f80, dst_size=8192, noblock=false)

(gdb) select-frame 27
(gdb) info locals
ds = 0x18cf82a0
decPtr = 0x18cf8aec ""
decBytes = -87

This is the buffer overrun by decompression. I think the receive
buffer must be twice bigger than the send buffer to accommodate such
messages.
Also this portion of lz4_decompress()
Assert(decBytes > 0);
must actually be a real check and elog(ERROR,). Because clients can
intentionally compose CompressedData to blow up a server.

Best regards, Andrey Borodin.




Re: libpq compression (part 2)

2022-11-12 Thread Andrey Borodin
On Tue, Aug 2, 2022 at 1:53 PM Jacob Champion  wrote:
>
> and changing the status to "Needs Review"

I've tried the patch, it works as advertised. The code generally is
OK, maybe some functions require comments (because at least their
neighbours have some).
Some linkers complain about zs_is_valid_impl_id() being inline while
used in other modules.

Some architectural notes:
1. Currently when the user requests compression from libpq, but the
server does not support any of the codecs client have - the connection
will be rejected by client. I think this should be configured akin to
SSL: on, try, off.
2. On the zpq_stream level of abstraction we parse a stream of bytes
to look for individual message headers. I think this is OK, just a
little bit awkward.
3. CompressionAck message can be completely replaced by ParameterStatus message.
4. Instead of sending a separate SetCompressionMethod, the
CompressedData can have its header with the index of the used method
and the necessity to restart compression context.
5. Portions of pg_stat_network_traffic can be extracted to separate
patch step to ease the review. And, actually, the scope of this view
is slightly beyond compression anyway.

What do you think?

Also, compression is a very cool and awaited feature, hope to see it
committed one day, thank you for working on this!


Best regards, Andrey Borodin.




Re: libpq compression (part 2)

2022-08-02 Thread Jacob Champion
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

https://commitfest.postgresql.org/38/3499/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob




Re: libpq compression (part 2)

2022-03-03 Thread Daniil Zakhlystov
Ok, thanks


> On 3 Mar 2022, at 02:33, Justin Pryzby  wrote:
> 
> If there's no objection, I'd like to move this to the next CF for 
> consideration
> in PG16.
> 
> On Mon, Jan 17, 2022 at 10:39:19PM -0600, Justin Pryzby wrote:
>> On Tue, Jan 18, 2022 at 02:06:32AM +0500, Daniil Zakhlystov wrote:
 => Since March, errmsg doesn't need extra parenthesis around it (e3a87b4).
>> 
>>> I’ve resolved the stuck tests and added zlib support for CI Windows builds 
>>> to patch 0003-*.  Thanks
>>> for the suggestion, all tests seem to be OK now, except the macOS one.  It 
>>> won't schedule in Cirrus
>>> CI for some reason, but I guess it happens because of my GitHub account 
>>> limitation.
>> 
>> I don't know about your github account, but it works for cfbot, which is now
>> green.
>> 
>> Thanks for implementing zlib for windows.  Did you try this with default
>> compressions set to lz4 and zstd ?
>> 
>> The thread from 2019 is very long, and starts off with the guidance that
>> compression had been implemented at the wrong layer.  It looks like this 
>> hasn't
>> changed since then.  secure_read/write are passed as function pointers to the
>> ZPQ interface, which then calls back to them to read and flush its 
>> compression
>> buffers.  As I understand, the suggestion was to leave the socket reads and
>> writes alone.  And then conditionally de/compress buffers after reading /
>> before writing from the socket if compression was negotiated.
>> 
>> It's currently done like this
>> pq_recvbuf() => secure_read() - when compression is disabled 
>> pq_recvbuf() => ZPQ => secure_read() - when compression is enabled 
>> 
>> Dmitri sent a partial, POC patch which changes the de/compression to happen 
>> in
>> secure_read/write, which is changed to call ZPQ:  
>> https://www.postgresql.org/message-id/ca+q6zcuprssnars+fyobsd-f2stk1roa-4sahfofajowlzi...@mail.gmail.com
>> pq_recvbuf() => secure_read() => ZPQ
>> 
>> The same thing is true of the frontend: function pointers to
>> pqsecure_read/write are being passed to zpq_create, and then the ZPQ 
>> interface
>> called instead of the original functions.  Those are the functions which read
>> using SSL, so they should also handle compression.
>> 
>> That's where SSL is handled, and it seems like the right place to handle
>> compression.  Have you evaluated that way to do things ?
>> 
>> Konstantin said he put ZPQ at that layer seems to 1) avoid code duplication
>> between client/server; and, 2) to allow compression to happen before SSL, to
>> allow both (if the admin decides it's okay).  But I don't see why compression
>> can't happen before sending to SSL, or after reading from it?





Re: libpq compression (part 2)

2022-03-02 Thread Justin Pryzby
If there's no objection, I'd like to move this to the next CF for consideration
in PG16.

On Mon, Jan 17, 2022 at 10:39:19PM -0600, Justin Pryzby wrote:
> On Tue, Jan 18, 2022 at 02:06:32AM +0500, Daniil Zakhlystov wrote:
> > > => Since March, errmsg doesn't need extra parenthesis around it (e3a87b4).
> 
> > I’ve resolved the stuck tests and added zlib support for CI Windows builds 
> > to patch 0003-*.  Thanks
> > for the suggestion, all tests seem to be OK now, except the macOS one.  It 
> > won't schedule in Cirrus
> > CI for some reason, but I guess it happens because of my GitHub account 
> > limitation.
> 
> I don't know about your github account, but it works for cfbot, which is now
> green.
> 
> Thanks for implementing zlib for windows.  Did you try this with default
> compressions set to lz4 and zstd ?
> 
> The thread from 2019 is very long, and starts off with the guidance that
> compression had been implemented at the wrong layer.  It looks like this 
> hasn't
> changed since then.  secure_read/write are passed as function pointers to the
> ZPQ interface, which then calls back to them to read and flush its compression
> buffers.  As I understand, the suggestion was to leave the socket reads and
> writes alone.  And then conditionally de/compress buffers after reading /
> before writing from the socket if compression was negotiated.
> 
> It's currently done like this
> pq_recvbuf() => secure_read() - when compression is disabled 
> pq_recvbuf() => ZPQ => secure_read() - when compression is enabled 
> 
> Dmitri sent a partial, POC patch which changes the de/compression to happen in
> secure_read/write, which is changed to call ZPQ:  
> https://www.postgresql.org/message-id/ca+q6zcuprssnars+fyobsd-f2stk1roa-4sahfofajowlzi...@mail.gmail.com
> pq_recvbuf() => secure_read() => ZPQ
> 
> The same thing is true of the frontend: function pointers to
> pqsecure_read/write are being passed to zpq_create, and then the ZPQ interface
> called instead of the original functions.  Those are the functions which read
> using SSL, so they should also handle compression.
> 
> That's where SSL is handled, and it seems like the right place to handle
> compression.  Have you evaluated that way to do things ?
> 
> Konstantin said he put ZPQ at that layer seems to 1) avoid code duplication
> between client/server; and, 2) to allow compression to happen before SSL, to
> allow both (if the admin decides it's okay).  But I don't see why compression
> can't happen before sending to SSL, or after reading from it?




Re: libpq compression (part 2)

2022-01-17 Thread Justin Pryzby
On Tue, Jan 18, 2022 at 02:06:32AM +0500, Daniil Zakhlystov wrote:
> > => Since March, errmsg doesn't need extra parenthesis around it (e3a87b4).

> I’ve resolved the stuck tests and added zlib support for CI Windows builds to 
> patch 0003-*.  Thanks
> for the suggestion, all tests seem to be OK now, except the macOS one.  It 
> won't schedule in Cirrus
> CI for some reason, but I guess it happens because of my GitHub account 
> limitation.

I don't know about your github account, but it works for cfbot, which is now
green.

Thanks for implementing zlib for windows.  Did you try this with default
compressions set to lz4 and zstd ?

The thread from 2019 is very long, and starts off with the guidance that
compression had been implemented at the wrong layer.  It looks like this hasn't
changed since then.  secure_read/write are passed as function pointers to the
ZPQ interface, which then calls back to them to read and flush its compression
buffers.  As I understand, the suggestion was to leave the socket reads and
writes alone.  And then conditionally de/compress buffers after reading /
before writing from the socket if compression was negotiated.

It's currently done like this
pq_recvbuf() => secure_read() - when compression is disabled 
pq_recvbuf() => ZPQ => secure_read() - when compression is enabled 

Dmitri sent a partial, POC patch which changes the de/compression to happen in
secure_read/write, which is changed to call ZPQ:  
https://www.postgresql.org/message-id/ca+q6zcuprssnars+fyobsd-f2stk1roa-4sahfofajowlzi...@mail.gmail.com
pq_recvbuf() => secure_read() => ZPQ

The same thing is true of the frontend: function pointers to
pqsecure_read/write are being passed to zpq_create, and then the ZPQ interface
called instead of the original functions.  Those are the functions which read
using SSL, so they should also handle compression.

That's where SSL is handled, and it seems like the right place to handle
compression.  Have you evaluated that way to do things ?

Konstantin said he put ZPQ at that layer seems to 1) avoid code duplication
between client/server; and, 2) to allow compression to happen before SSL, to
allow both (if the admin decides it's okay).  But I don't see why compression
can't happen before sending to SSL, or after reading from it?

-- 
Justin




Re: libpq compression (part 2)

2022-01-13 Thread Justin Pryzby
On Fri, Jan 14, 2022 at 02:12:17AM +0500, Daniil Zakhlystov wrote:
> Hi, Justin!
> 
> First of all, thanks for the detailed review. I’ve applied your patches to 
> the current version.

Note that my message had other comments that weren't addressed in this patch.

Your 0003 patch has a couple "noise" hunks that get rid of ^M characters added
in previous patches.  The ^M shouldn't be added in the first place.  Did you
apply my fixes using git-am or something else ?

On Fri, Jan 14, 2022 at 02:12:17AM +0500, Daniil Zakhlystov wrote:
> > On 12 Jan 2022, at 19:15, Justin Pryzby  wrote:
> > 
> > zlib still causes check-world to get stuck.  I first mentioned this last 
> > March:
> > 20210319062800.gi11...@telsasoft.com
> > 
...
> > I removed the thread from the CFBOT until that's resolved.
> 
> I’ve fixed the failing tests, now they should pass.

macos: passed
linux: timed out after 1hr
freebsd: failed in pg_rewind: ... <= replay_lsn AND state = 'streaming' FROM ...
windows: "Failed test 'data_checksums=on is reported on an offline cluster 
stdout /(?^:^on$)/'" / WARNING:  01000: algorithm zlib is not supported

Note that it's possible and easy to kick off a CI run using any github account:
see ./src/tools/ci/README

For me, it's faster than running check-world -j4 locally, and runs tests on 4
OSes.

I re-ran your branch under my own account and linux didn't get stuck (and the
compiler warnings tests passed).  But on a third attempt, macos failed the
pg_rewind test, and bsd failed the subscription test:
| SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('s', 
'r');

-- 
Justin




Re: libpq compression (part 2)

2022-01-12 Thread Justin Pryzby
zlib still causes check-world to get stuck.  I first mentioned this last March:
20210319062800.gi11...@telsasoft.com

Actually all the compression methods seems to get stuck with
time make check -C src/bin/pg_rewind
time make check -C src/test/isolation

For CI purposes, there should be an 0003 patch which enables compression by
default, for all message types and maybe all lengths.

I removed the thread from the CFBOT until that's resolved.

-- 
Justin




Re: libpq compression (part 2)

2022-01-12 Thread Andrey Borodin



> 8 янв. 2022 г., в 01:56, Stephen Frost  написал(а):
>> 
>> It's discussed in last year's thread.  The thinking is that there tends to be
>> *fewer* exploitable opportunities between application->DB than between
>> browser->app.
> 
> Yes, this was discussed previously and addressed.

What else do we need to decide architecturally to make protocol compression 
happen in 15? As far as I can see - only HBA\GUC part.

Best regards, Andrey Borodin.





Re: libpq compression (part 2)

2022-01-07 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Jan  7, 2022 at 03:56:39PM -0500, Stephen Frost wrote:
> > As for the details of how we allow control over it, I suppose there's a
> > number of options.  Having it in the HBA doesn't seem terrible, though I
> > suspect most will just want to enable it across the board and having to
> > have "compression=allowed" or whatever added to every hba line seems
> > likely to be annoying.  Maybe a global GUC and then allow the hba to
> > override?
> 
> Ewe, I would like to avoid going in the GUC & pg_hba.conf direction,
> unless we have other cases where we already do this.

I mean, not exactly the same, but ... ssl?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: libpq compression (part 2)

2022-01-07 Thread Bruce Momjian
On Fri, Jan  7, 2022 at 03:56:39PM -0500, Stephen Frost wrote:
> As for the details of how we allow control over it, I suppose there's a
> number of options.  Having it in the HBA doesn't seem terrible, though I
> suspect most will just want to enable it across the board and having to
> have "compression=allowed" or whatever added to every hba line seems
> likely to be annoying.  Maybe a global GUC and then allow the hba to
> override?

Ewe, I would like to avoid going in the GUC & pg_hba.conf direction,
unless we have other cases where we already do this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: libpq compression (part 2)

2022-01-07 Thread Stephen Frost
Greetings,

* Justin Pryzby (pry...@telsasoft.com) wrote:
> On Fri, Jan 07, 2022 at 01:46:00PM -0500, Bruce Momjian wrote:
> > On Sat, Jan  1, 2022 at 11:25:05AM -0600, Justin Pryzby wrote:
> > > Thanks for working on this.  The patch looks to be in good shape - I hope 
> > > more
> > > people will help to review and test it.  I took the liberty of creating a 
> > > new
> > > CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html
> > > 
> > > +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
> > > +{
> > > +   return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;
> > > 
> > > I think this is backwards , and should say != -1 ?
> > > 
> > > As written, the server GUC libpq_compression defaults to "on", and the 
> > > client
> > > doesn't request compression.  I think the server GUC should default to 
> > > off.
> > > I failed to convince Kontantin about this last year.  The reason is that 
> > > 1)
> > > it's a new feature; 2) with security implications.  An admin should need 
> > > to
> > > "opt in" to this.  I still wonder if this should be controlled by a new 
> > > "TYPE"
> > > in pg_hba (rather than a GUC); that would make it exclusive of SSL.
> > 
> > I assume this compression happens before it is encrypted for TLS
> > transport.  Second, compression was removed from TLS because there were
> > too many ways for HTTP to weaken encryption.  I assume the Postgres wire
> > protocol doesn't have similar exploit possibilities.
> 
> It's discussed in last year's thread.  The thinking is that there tends to be
> *fewer* exploitable opportunities between application->DB than between
> browser->app.

Yes, this was discussed previously and addressed.

> But it's still a known concern, and should default to off - as I said.

I'm not entirely convinced of this but also am happy enough as long as
the capability exists, no matter if it's off or on by default.

> That's also why I wondered if compression should be controlled by pg_hba,
> rather than a GUC.  To require/allow an DBA to opt-in to it for specific 
> hosts.
> Or to make it exclusive of ssl.  We could choose to not suppose that case at
> all, or (depending on the implement) refuse that combination of layers.

I'm definitely against us deciding that we know better than admins if
this is an acceptable trade-off in their environment, or not.  Letting
users/admins control it is fine, but I don't feel we should forbid it.

As for the details of how we allow control over it, I suppose there's a
number of options.  Having it in the HBA doesn't seem terrible, though I
suspect most will just want to enable it across the board and having to
have "compression=allowed" or whatever added to every hba line seems
likely to be annoying.  Maybe a global GUC and then allow the hba to
override?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: libpq compression (part 2)

2022-01-07 Thread Justin Pryzby
On Fri, Jan 07, 2022 at 01:46:00PM -0500, Bruce Momjian wrote:
> On Sat, Jan  1, 2022 at 11:25:05AM -0600, Justin Pryzby wrote:
> > Thanks for working on this.  The patch looks to be in good shape - I hope 
> > more
> > people will help to review and test it.  I took the liberty of creating a 
> > new
> > CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html
> > 
> > +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
> > +{
> > +   return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;
> > 
> > I think this is backwards , and should say != -1 ?
> > 
> > As written, the server GUC libpq_compression defaults to "on", and the 
> > client
> > doesn't request compression.  I think the server GUC should default to off.
> > I failed to convince Kontantin about this last year.  The reason is that 1)
> > it's a new feature; 2) with security implications.  An admin should need to
> > "opt in" to this.  I still wonder if this should be controlled by a new 
> > "TYPE"
> > in pg_hba (rather than a GUC); that would make it exclusive of SSL.
> 
> I assume this compression happens before it is encrypted for TLS
> transport.  Second, compression was removed from TLS because there were
> too many ways for HTTP to weaken encryption.  I assume the Postgres wire
> protocol doesn't have similar exploit possibilities.

It's discussed in last year's thread.  The thinking is that there tends to be
*fewer* exploitable opportunities between application->DB than between
browser->app.

But it's still a known concern, and should default to off - as I said.

That's also why I wondered if compression should be controlled by pg_hba,
rather than a GUC.  To require/allow an DBA to opt-in to it for specific hosts.
Or to make it exclusive of ssl.  We could choose to not suppose that case at
all, or (depending on the implement) refuse that combination of layers.

-- 
Justin




Re: libpq compression (part 2)

2022-01-07 Thread Bruce Momjian
On Sat, Jan  1, 2022 at 11:25:05AM -0600, Justin Pryzby wrote:
> Thanks for working on this.  The patch looks to be in good shape - I hope more
> people will help to review and test it.  I took the liberty of creating a new
> CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html
> 
> +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
> +{
> +   return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;
> 
> I think this is backwards , and should say != -1 ?
> 
> As written, the server GUC libpq_compression defaults to "on", and the client
> doesn't request compression.  I think the server GUC should default to off.
> I failed to convince Kontantin about this last year.  The reason is that 1)
> it's a new feature; 2) with security implications.  An admin should need to
> "opt in" to this.  I still wonder if this should be controlled by a new "TYPE"
> in pg_hba (rather than a GUC); that would make it exclusive of SSL.

I assume this compression happens before it is encrypted for TLS
transport.  Second, compression was removed from TLS because there were
too many ways for HTTP to weaken encryption.  I assume the Postgres wire
protocol doesn't have similar exploit possibilities.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: libpq compression (part 2)

2022-01-03 Thread Andrey Borodin


> Maybe you should reset the streams between each compression message (even if
> it's using the same compression algorithm). This might allow better
> compression.

AFAIK on the contrary - longer data sequence usually compresses better. The 
codec can use knowledge about prior data to better compress current bytes.

Best regards, Andrey Borodin.




Re: libpq compression

2021-12-02 Thread Michael Paquier
On Fri, Oct 01, 2021 at 11:20:09PM +0200, Daniel Gustafsson wrote:
> To keep this thread from stalling, attached is a rebased patchset with the
> above mentioned fix to try and get this working on Windows.

This patch has been waiting on author for two months now, so I have
marked it as RwF in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: libpq compression

2021-10-07 Thread Daniil Zakhlystov
Hi, thanks for your fix! I am currently working on implementing the lz4 support for libpq compression. Looking forward to post an update soon.—Daniil ZakhlystovOn 2 Oct 2021, at 02:20, Daniel Gustafsson  wrote:On 2 Sep 2021, at 00:29, Daniel Gustafsson  wrote:On 29 Jul 2021, at 16:57, Daniil Zakhlystov  wrote:Forgot to attach the updated patch :) This fails to build on Windows due to the use of strcasecmp:+		if (strcasecmp(supported_algorithms[zpq->compressors[i].impl], "zstd") == Was that meant to be pg_strcasecmp?To keep this thread from stalling, attached is a rebased patchset with theabove mentioned fix to try and get this working on Windows.--Daniel Gustafsson		https://vmware.com/<0001-Add-zlib-and-zstd-streaming-compression.patch><0002-Implement-libpq-compression.patch>



Re: libpq compression

2021-09-01 Thread Daniel Gustafsson
> On 29 Jul 2021, at 16:57, Daniil Zakhlystov  wrote:
> 
> Forgot to attach the updated patch :) 

This fails to build on Windows due to the use of strcasecmp:

+   if (strcasecmp(supported_algorithms[zpq->compressors[i].impl], 
"zstd") == 

Was that meant to be pg_strcasecmp?

--
Daniel Gustafsson   https://vmware.com/





Re: libpq compression

2021-07-29 Thread Daniil Zakhlystov
Hi!

I made some noticeable changes to the patch and fixed the previously mentioned 
issues.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby  wrote:
> Previously, I suggested that the server should have a "policy" GUC defining
> which compression methods are allowed.  Possibly including compression 
> "level".
> For example, the default might be to allow zstd, but only up to level 9.

Now libpq_compression GUC server setting controls the available client-server 
traffic compression methods.
It allows specifying the exact list of the allowed compression algorithms.

For example, to allow only and zlib, set the setting to "zstd,zlib". Also, 
maximal allowed compression level can be specified for each 
method, e.g. "zstd:1,zlib:2" setting will set the maximal compression level for 
zstd to 1 and zlib to 2. 
If a client requests the compression with a higher compression level, it will 
be set to the maximal allowed one. 
The default (and recommended) maximal compression level for each algorithm is 1.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby  wrote:
> There's commit messages claiming that the compression can be "asymmetric"
> between client-server vs server-client, but the commit seems to be unrelated,
> and the protocol documentation doesn't describe how this works.

On Dec 10, 2020, at 1:39 AM, Robert Haas  wrote:
> Another idea is that you could have a new message type that says "hey,
> the payload of this is 1 or more compressed messages." It uses the
> most-recently set compression method. This would make switching
> compression methods easier since the SetCompressionMethod message
> itself could always be sent uncompressed and/or not take effect until
> the next compressed message


I rewrote the compression initialization logic. Now it is asymmetric and the 
compression method is changeable on the fly.

Now compression initialization consists only of two phases:
1. Client sends startup packet with _pq_.compression containing the list of 
compression algorithms specified by the client with an optional
specification of compression level (e.g. “zstd:2,zlib:1”)
2. The server intersects the requested compression algorithms with the allowed 
ones (controlled via the libpq_compression server config setting).
If the intersection is not empty, the server responds with CompressionAck 
containing the final list of the compression algorithms that can be used for 
the compression of libpq messages between the client and server.
If the intersection is empty (server does not accept any of the requested 
algorithms), then it replies with CompressionAck containing the empty list.

After sending the CompressionAck message, the server can send the 
SetCompressionMethod message to set the current compression algorithm for 
server-to-client traffic compression.
After receiving the CompressionAck message, the client can send the 
SetCompressionMethod message to set the current compression algorithm for 
client-to-server traffic compression.

SetCompressionMethod message contains the index of the compression algorithm in 
the final list of the compression algorithms which is generated during 
compression initialization (described above).

Compressed data is wrapped into CompressedData messages.

Rules for compressing the protocol messages are defined in zpq_stream.c. For 
each protocol message, the preferred compression algorithm can be chosen.

On Wed, Apr 21, 2021 at 15:35 AM Ian Zagorskikh  
wrote:
> Let me drop my humble 2 cents in this thread. At this moment I checked only 
> 0001-Add-zlib-and-zstd-streaming-compression patch. With no order:
> 
> * No checks for failures. For example, value from malloc() is not checked for 
> NULL and used as-is right after the call. Also as a result possibly NULL 
> values passed into ZSTD functions which are explicitly not NULL-tolerant and 
> so dereference pointers without checks.
> 
> * Used memory management does not follow the schema used in the common 
> module. Direct calls to std C malloc/free are hard coded. AFAIU this is not 
> backend friendly. Looking at the code around I believe they should be wrapped 
> to ALLOC/FREE local macroses that depend on FRONTEND define. As it's done for 
> example. in src/common/hmac.c.
> 
> * If we're going to fix our code to be palloc/pfree friendly there's also a 
> way to pass our custom allocator callbacks inside ZSTD functions. By default 
> ZSTD uses malloc/free but it can be overwritten by the caller in e.g. 
> ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API . IMHO 
> that would be good. If a 3rd party component allows us to inject  a custom 
> memory allocator and we do have it why not use this feature?
> 
> * Most zs_foo() functions do not check ptr arguments, though some like 
> zs_free() do. If we're speaking about a "common API" that can be used by a 
> wide range of different modules around the project, such a relaxed way to 
> treat input arguments IMHO can be an evil. Or at least add Assert(ptr) 
> 

Re: libpq compression

2021-07-14 Thread vignesh C
On Wed, Jul 14, 2021 at 6:31 PM Daniil Zakhlystov
 wrote:
>
> **sorry for the noise, but I need to re-send the message because one of the 
> recipients is blocked on the pgsql-hackers for some reason**
>
> Hi!
>
> Done, the patch should apply to the current master now.
>
> Actually, I have an almost-finished version of the patch with the previously 
> requested asymmetric compression negotiation. I plan to attach it soon.

Thanks for providing the patch quickly, I have changed the status to
"Need Review".

Regards,
Vignesh




Re: libpq compression

2021-04-22 Thread Ian Zagorskikh
One little fix to 0001-Add-zlib-and-zstd-streaming-compression patch for
configure.ac


@@ -1455,6 +1456,7 @@ fi

 if test "$with_lz4" = yes; then
   AC_CHECK_HEADERS(lz4.h, [], [AC_MSG_ERROR([lz4.h header file is required
for LZ4])])
+fi


Otherwise autoconf generates a broken configure script.

-- 
Best Regards,
Ian Zagorskikh
CloudLinux: https://www.cloudlinux.com/


Re: libpq compression

2021-04-21 Thread Ian Zagorskikh
All, thanks!

On Wed, Apr 21, 2021 at 10:47 AM Michael Paquier 
wrote:


> Patch reviews had better be posted on the community lists.  This way,
> if the patch is left dead by the authors (things happen in life), then
> somebody else could move on with the patch without having to worry
> about this kind of issues twice.
> --
> Michael
>

Let me drop my humble 2 cents in this thread. At this moment I checked only
0001-Add-zlib-and-zstd-streaming-compression patch. With no order:

* No checks for failures. For example, value from malloc() is not checked
for NULL and used as-is right after the call. Also as a result possibly
NULL values passed into ZSTD functions which are explicitly not
NULL-tolerant and so dereference pointers without checks.

* Used memory management does not follow the schema used in the common
module. Direct calls to std C malloc/free are hard coded. AFAIU this is not
backend friendly. Looking at the code around I believe they should be
wrapped to ALLOC/FREE local macroses that depend on FRONTEND define. As
it's done for example. in src/common/hmac.c.

* If we're going to fix our code to be palloc/pfree friendly there's also a
way to pass our custom allocator callbacks inside ZSTD functions. By
default ZSTD uses malloc/free but it can be overwritten by the caller in
e.g. ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API
. IMHO that would be good. If a 3rd party component allows us to inject  a
custom memory allocator and we do have it why not use this feature?

* Most zs_foo() functions do not check ptr arguments, though some like
zs_free() do. If we're speaking about a "common API" that can be used by a
wide range of different modules around the project, such a relaxed way to
treat input arguments IMHO can be an evil. Or at least add Assert(ptr)
assertions so they can be catched up in debug mode.

* For zs_create() function which must be called first to create context for
further operations, a compression method is passed as integer. This method
is used inside zs_create() as index inside an array of compress
implementation descriptors. There are several problems:
1) Method ID value is not checked for validity. By passing an invalid
method ID we can easily get out of array bounds.
2) Content of array depends on on configuration options e.g. HAVE_LIBZSTD,
HAVE_LIBZ etc. So index inside this array is not persistent. For some
config options combination method ID with value 0 may refer to ZSTD and for
others to zlib.
3) There's no defines/enums/etc in public z_stream.h header that would
define which value should be passed to create a specific compressor. Users
have to guess it or check the code.

* I have a feeling after reading comments for ZSTD compress/decompress
functions that their return value is treated in a wrong way handling only
success path. Though need more checks/POCs on that.

In general, IMHO the idea of a generic compress/decompress API is very good
and useful. But specific implementation needs some code cleanup/refactoring.

-- 
Best Regards,
Ian Zagorskikh
CloudLinux: https://www.cloudlinux.com/


Re: libpq compression

2021-04-21 Thread Michael Paquier
On Wed, Apr 21, 2021 at 09:08:09AM +, Ian Zagorskikh wrote:
> I took a look at proposed patches and found several typos/mistakes. Where
> should I send my comments? In this thread or directly to the authors?

Patch reviews had better be posted on the community lists.  This way,
if the patch is left dead by the authors (things happen in life), then
somebody else could move on with the patch without having to worry
about this kind of issues twice.
--
Michael


signature.asc
Description: PGP signature


Re: libpq compression

2021-04-21 Thread Amit Kapila
On Wed, Apr 21, 2021 at 2:38 PM Ian Zagorskikh
 wrote:
>
> Hi all!
>
> I took a look at proposed patches and found several typos/mistakes. Where 
> should I send my comments? In this thread or directly to the authors?
>

I feel it is good to send comments here. This is what we normally do
for all the patches developed in this mailing list.

-- 
With Regards,
Amit Kapila.




Re: libpq compression

2021-04-21 Thread Ian Zagorskikh
Hi all!

I took a look at proposed patches and found several typos/mistakes. Where
should I send my comments? In this thread or directly to the authors?

Thanks!


On Wed, Apr 21, 2021 at 9:03 AM Daniil Zakhlystov 
wrote:

> Hi, thanks for your review!
>
> > On Mar 19, 2021, at 11:28 AM, Justin Pryzby 
> wrote:
> >
> > This needs some significant effort:
> >
> > - squish commits;
> >   * maybe make an 0001 commit supporting zlib, and an 0002 commit adding
> zstd;
> >   * add an 0003 patch to enable zlib compression by default (for CI -
> not for merge);
> > - rebase to current master;
>
> I’ve rebased the chunked compression branch and attached it to this
> message as two diff patches:
> 0001-Add-zlib-and-zstd-streaming-compression.patch - this patch introduces
> general functionality for zlib and zstd streaming compression
> 0002-Implement-libpq-compression.patch - this patch introduces libpq
> chunked compression
>
> > - compile without warnings;
> > - assign OIDs at the end of the ranges given by find-unused-oids to avoid
> >   conflicts with patches being merged to master.
>
> Done
>
> > Currently, make check-world gets stuck in several places when I use zlib.
> >
> > There's commit messages claiming that the compression can be "asymmetric"
> > between client-server vs server-client, but the commit seems to be
> unrelated,
> > and the protocol documentation doesn't describe how this works.
> >
> > Previously, I suggested that the server should have a "policy" GUC
> defining
> > which compression methods are allowed.  Possibly including compression
> "level".
> > For example, the default might be to allow zstd, but only up to level 9.
>
> Support for different compression methods in each direction is implemented
> in zpq_stream.c,
> the only thing left to do is to define the GUC settings to control this
> behavior and make adjustments to the compression handshake process.
>
> Earlier in the thread, I’ve discussed the introduction of
> compress_algorithms and decompress_algorithms GUC settings with Robert Haas.
> The main issue with the decompress_algorithms setting is that the
> receiving side can’t effectively enforce the actual compression level
> chosen by the sending side.
>
> So I propose to add the two options to the client and server GUC:
> 1. compress_algorithms setting with the ability to specify the exact
> compression level for each algorithm
> 2. decompress_algorithms to control which algorithms are supported for
> decompression of the incoming messages
>
> For example:
>
> Server
> compress_algorithms = ’zstd:2; zlib:5’ // use the zstd with compression
> level 2 or zlib with compression level 5 for outgoing messages
> decompress_algorithms = ‘zstd; zlib’ // allow the zstd and zlib algorithms
> for incoming messages
>
> Client
> compress_algorithms = ’zstd; zlib:3’ // use the zstd with default
> compression level (1) or zlib with compression level 3 for outgoing
> messages
> decompress_algorithms = ‘zstd’ // allow the zstd algorithm for incoming
> messages
>
> Robert, If I missed something from our previous discussion, please let me
> know. If this approach is OK, I'll implement it.
>
> > This:
> > +   /* Initialise values and NULL flags arrays */
> > +   MemSet(values, 0, sizeof(values));
> > +   MemSet(nulls, 0, sizeof(nulls));
> >
> > can be just:
> > +   boolnulls[PG_STAT_NETWORK_TRAFFIC_COLS] = {false};
> > since values is fully populated below.
> >
>
> The current implementation matches the other functions in pgstatfuncs.c so
> I think that it is better not to change it.
>
> > typo: aslogirhm
>
> Fixed
>
> > I wrote about Solution.pm earlier in this thread, and I see that it's in
> > Konstantin's repo since Jan 12, but it's not in yours (?) so I think
> windows
> > build will fail.
> >
> > Likewise, commit 1a946e14e in his branch adds compression into to psql
> > \connninfo based on my suggestion, but it's missing in your branch.
>
> I've rebased the patch. Now it includes Konstantin's fixes.
>
> —
> Daniil Zakhlystov
>
>

-- 
Best Regards,

Ian Zagorskikh
Senior C Developer/Alternatives Team

CloudLinux.com | KernelCare.com | Imunify360 | AlmaLinux


Re: libpq compression

2021-03-19 Thread Justin Pryzby
On Thu, Mar 18, 2021 at 08:02:32PM -0500, Justin Pryzby wrote:
> On Thu, Mar 18, 2021 at 07:30:09PM +, Daniil Zakhlystov wrote:
> > The new status of this patch is: Ready for Committer
> 
> The CF bot is failing , because the last patch sent to the list is from 
> January:
> | Latest attachment (libpq-compression-31.patch) at 2021-01-12 14:05:22 from 
> Konstantin Knizhnik ...
> 
> The most recent messages have all had links to github repos without patches
> attached.
> 
> Also, the branches I've looked at on the github repos all have messy history,
> and need to be squished into a coherent commit or set of commits.
> 
> Would you send a patch to the list with the commits as you propose to merge
> them ?

This needs some significant effort:

 - squish commits;
   * maybe make an 0001 commit supporting zlib, and an 0002 commit adding zstd;
   * add an 0003 patch to enable zlib compression by default (for CI - not for 
merge);
 - rebase to current master;
 - compile without warnings;
 - assign OIDs at the end of the ranges given by find-unused-oids to avoid
   conflicts with patches being merged to master.

Currently, make check-world gets stuck in several places when I use zlib.

There's commit messages claiming that the compression can be "asymmetric"
between client-server vs server-client, but the commit seems to be unrelated,
and the protocol documentation doesn't describe how this works.

Previously, I suggested that the server should have a "policy" GUC defining
which compression methods are allowed.  Possibly including compression "level".
For example, the default might be to allow zstd, but only up to level 9.

This:
+   /* Initialise values and NULL flags arrays */
+   MemSet(values, 0, sizeof(values));
+   MemSet(nulls, 0, sizeof(nulls));

can be just:
+   boolnulls[PG_STAT_NETWORK_TRAFFIC_COLS] = {false};
since values is fully populated below.

typo: aslogirhm

I wrote about Solution.pm earlier in this thread, and I see that it's in
Konstantin's repo since Jan 12, but it's not in yours (?) so I think windows
build will fail.

Likewise, commit 1a946e14e in his branch adds compression into to psql
\connninfo based on my suggestion, but it's missing in your branch.

I've added Daniil as a 2nd author and set back to "waiting on author".




Re: libpq compression

2021-03-18 Thread Justin Pryzby
On Thu, Mar 18, 2021 at 07:30:09PM +, Daniil Zakhlystov wrote:
> The new status of this patch is: Ready for Committer

The CF bot is failing , because the last patch sent to the list is from January:
| Latest attachment (libpq-compression-31.patch) at 2021-01-12 14:05:22 from 
Konstantin Knizhnik ...

The most recent messages have all had links to github repos without patches
attached.

Also, the branches I've looked at on the github repos all have messy history,
and need to be squished into a coherent commit or set of commits.

Would you send a patch to the list with the commits as you propose to merge
them ?

Also, I'm not sure, but I think you may find that the zstd configure.ac should
use pkgconfig.  This allowed the CIs to compile these patches.  Without
pkg-config, the macos CI couldn't find (at least) LZ4.k
https://commitfest.postgresql.org/32/2813/
https://commitfest.postgresql.org/32/3015/

Also, in those patches, we have separate "not for merge" patches which enable
the compression library by default.  This allows the CIs to exercise the
feature.

-- 
Justin




Re: libpq compression

2021-03-18 Thread Daniil Zakhlystov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi,

I've compared the different libpq compression approaches in the streaming 
physical replication scenario.

Test setup
Three hosts: first is used for pg_restore run, second is master, third is the 
standby replica.
In each test run, I've run the pg_restore of the IMDB database 
(https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/2QYZBT)
 
and measured the received traffic on the standby replica.

Also, I've enlarged the ZPQ_BUFFER_SIZE buffer in all versions because too 
small buffer size (8192 bytes) lead to more 
system calls to socket read/write and poor compression in the chunked-reset 
scenario.

Scenarios:

chunked
use streaming compression, wrap compressed data into CompressedData messages 
and preserve the compression context between multiple CompressedData messages.
https://github.com/usernamedt/libpq_compression/tree/chunked-compression

chunked-reset
use streaming compression, wrap compressed data into CompressedData messages 
and reset the compression context on each CompressedData message.
https://github.com/usernamedt/libpq_compression/tree/chunked-reset

permanent
use streaming compression, send raw compressed stream without any wrapping
https://github.com/usernamedt/libpq_compression/tree/permanent-w-enlarged-buffer

Tested compression levels
ZSTD, level 1
ZSTD, level 5
ZSTD, level 9

ScenarioReplica rx, mean, MB
uncompressed6683.6


ZSTD, level 1
ScenarioReplica rx, mean, MB
chunked-reset   2726
chunked 2694
permanent   2694.3

ZSTD, level 5
ScenarioReplica rx, mean, MB
chunked-reset   2234.3
chunked 2123
permanent   2115.3

ZSTD, level 9
ScenarioReplica rx, mean, MB
chunked-reset   2153.6
chunked 1943
permanent   1941.6

Full report with additional data and resource usage graphs is available here
https://docs.google.com/document/d/1a5bj0jhtFMWRKQqwu9ag1PgDF5fLo7Ayrw3Uh53VEbs

Based on these results, I suggest sticking with chunked compression approach
which introduces more flexibility and contains almost no overhead compared to 
permanent compression.
Also, later we may introduce some setting to control should we reset the 
compression context in each message without
breaking the backward compatibility.

--
Daniil Zakhlystov

The new status of this patch is: Ready for Committer


Re: libpq compression

2021-03-01 Thread Andrey Borodin


> 26 февр. 2021 г., в 02:18, Daniil Zakhlystov  
> написал(а):
> 
> Yes, there is still no possibility for compressed traffic pass-through for 
> poolers,
> but do we actually need it?
> I don’t see any solution except starting a new compression context for
> each message in order to make it work.
> Do we actually need to hurt the compression ratio for this specific use case?
> 
> Actually, there is an ugly hack - we may force-reset the compression context 
> by sending the 
> SetCompressionMethod (which will reset the compression algorithm & context) 
> after each CompressedData message.
> 
> This should allow interpreting of each CompressedData message on its own but 
> will add overhead and hurt the compression ratio.

As a maintainer of a connection pooler, I don't think compressing individual 
messages on their own compression context adds any value for that particular 
pooler.
Network has two different costs:
1. Cross-datacenter traffic is costly and needs to be compressed. Better 
compression ratios are preferable to CPU savings.
2. Intra-datacenter traffic is cheap and compression is not that crucial. In my 
opinion on most loaded installations, the pooler must be on the same host as 
DB. Compression can be just off.

I don't think we should craft one more CPU\Network tradeoff point by our own 
hands. Enough tradeoff points are implemented by lz4\zstd\whatever.

CPU usage to compress data is neglectable small compared to CPU overhead to 
produce data traffic.

The only real feature of compressing individual messages is better CRIME 
mitigation. E.i. not compressing Auth packets and some parameters of the query. 
This feature does not depend on resetting compression context. And resetting 
compression context often does not help to mitigate CRIME.

Thanks!

Best regards, Andrey Borodin.



Re: libpq compression

2021-02-25 Thread Daniil Zakhlystov
Hi, thanks for your review,

> On Feb 22, 2021, at 10:38 AM, Craig Ringer  
> wrote:
> 
> 
> On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov,  
> wrote::
> 
> 3. Chunked compression allows to compress only well compressible messages and 
> save the CPU cycles by not compressing the others
> 4. Chunked compression introduces some traffic overhead compared to the 
> permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, 
> according to results in my previous message)
> 5. From the protocol point of view, chunked compression seems a little bit 
> more flexible:
>  - we can inject some uncompressed messages at any time without the need to 
> decompress/compress the compressed data
>  - we can potentially switch the compression algorithm at any time (but I 
> think that this might be over-engineering)
> 
> Chunked compression also potentially makes it easier to handle non blocking 
> sockets, because you aren't worrying about yet another layer of buffering 
> within the compression stream. This is a real pain with SSL, for example. 
> 
> It simplifies protocol analysis.
> 
> It permits compression to be decided on the fly, heuristically, based on 
> message size and potential compressibility.
> 
> It could relatively easily be extended to compress a group of pending small 
> messages, e.g. by PQflush. That'd help mitigate the downsides with small 
> messages.
> 
> So while stream compression offers better compression ratios, I'm inclined to 
> suspect we'll want message level compression.


Actually, by chunked compression, I’ve meant another variant of streaming 
compression,
where all of the CompressedData messages share the same compression context.
Frankly, I don’t think that starting a new compression context on each 
compressed
message makes sense because it will significantly hurt the compression ratio.

Also, in the current state, chunked compression requires buffering.
I’ll look into it, but it seems like that avoiding buffering will result in the 
increase
of the socket read/write system calls.

> On Feb 23, 2021, at 12:48 AM, Robert Haas  wrote:
> 
> 1. As I mentioned above, we need to fall back to sending the
> uncompressed message if compression fails to reduce the size, or if it
> doesn't reduce the size by enough to compensate for the header we have
> to add to the packet (I assume this is 5 bytes, perhaps 6 if you allow
> a byte to mention the compression type).
> 
> 2. Refining this further, if we notice that we are failing to compress
> messages regularly, maybe we should adaptively give up. The simplest
> idea would be something like: keep track of what percentage of the
> time compression succeeds in reducing the message size. If in the last
> 100 attempts we got a benefit fewer than 75 times, then conclude the
> data isn't very compressible and switch to only attempting to compress
> every twentieth packet or so. If the data changes and becomes more
> compressible again the statistics will eventually tilt back in favor
> of compressing every packet again; if not, we'll only be paying 5% of
> the overhead.
> 
> 3. There should be some minimum size before we attempt compression.
> pglz gives up right away if the input is less than 32 bytes; I don't
> know if that's the right limit, but presumably it'd be very difficult
> to save 5 or 6 bytes out of a message smaller than that, and maybe
> it's not worth trying even for slightly larger messages.
> 
> 4. It might be important to compress multiple packets at a time. I can
> even imagine having two different compressed protocol messages, one
> saying 'here is a compressed messages' and the other saying 'here are
> a bunch of compressed messages rolled up into one packet’.

I’ll look into 1) and 2). As for 3) and 4), this is already implemented.


> On Feb 23, 2021, at 1:40 AM, Andres Freund  wrote:
> 
> On 2021-02-22 14:48:25 -0500, Robert Haas wrote:
>> So, if I read these results correctly, on the "pg_restore of IMDB
>> database" test, we get 88% of the RX bytes reduction and 99.8% of the
>> TX bytes reduction for 90% of the CPU cost. On the "pgbench" test,
>> which probably has much smaller packets, chunked compression gives us
>> no bandwidth reduction and in fact consumes slightly more network
>> bandwidth -- which seems like it has to be an implementation defect,
>> since we should always be able to fall back to sending the
>> uncompressed packet if the compressed one is larger, or will be after
>> adding the wrapper overhead. But with the current code, at least, we
>> pay about a 30% CPU tax, and there's no improvement. The permanent
>> compression imposes a whopping 90% CPU tax, but we save about 33% on
>> TX bytes and about 14% on RX bytes.
> 
> It'd be good to fix the bandwidth increase issue, of course. But other
> than that I'm not really bothered by transactional workloads like
> pgbench not saving much / increasing overhead (within reason) compared
> to bulkier operations. With packets as small as the default 

Re: libpq compression

2021-02-23 Thread Konstantin Knizhnik



On 22.02.2021 08:38, Craig Ringer wrote:



On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov, 
mailto:usernam...@yandex-team.ru>> wrote::



3. Chunked compression allows to compress only well compressible
messages and save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared
to the permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB
database dump, according to results in my previous message)
5. From the protocol point of view, chunked compression seems a
little bit more flexible:
 - we can inject some uncompressed messages at any time without
the need to decompress/compress the compressed data
 - we can potentially switch the compression algorithm at any time
(but I think that this might be over-engineering)


Chunked compression also potentially makes it easier to handle non 
blocking sockets, because you aren't worrying about yet another layer 
of buffering within the compression stream. This is a real pain with 
SSL, for example.


It simplifies protocol analysis.

It permits compression to be decided on the fly, heuristically, based 
on message size and potential compressibility.


It could relatively easily be extended to compress a group of pending 
small messages, e.g. by PQflush. That'd help mitigate the downsides 
with small messages.


So while stream compression offers better compression ratios, I'm 
inclined to suspect we'll want message level compression.


From my point of view there are several use cases where protocol 
compression can be useful for:

1. Replication
2. Backup/dump
3. Bulk load (COPY)
4. Queries returning large objects (json, blobs,...)

All this cases are controlled by user or DBA, so them can make decision 
whether to use compression or not.
Switching compression on the fly, use different algorithms in different 
directions is not needed.
Yes, in all this scenarios data is mostly transferred in one direction. 
So compression of small messages going in opposite direction is not 
strictly needed.
But benchmarks shows that is has almost no influence on performance and 
CPU usage.
So I suggest not to complicate protocol and implementation and implement 
functionality which is present in most of other DBMSes.


There is no sense to try compression on workloads like pgbench and make 
some conclusions based on it. From my point of view it is obvious misuse.
Compressing of each message idividually or chunked compression 
significantly decrease compression ratio because typical size of message 
is not large enough
and resetting compression state after processing each message (clearing 
compression dictionary) adds too large overhead.




Re: libpq compression

2021-02-23 Thread Konstantin Knizhnik




On 22.02.2021 23:44, Tom Lane wrote:

Robert Haas  writes:

So at the end of the day I'm not really quite sure what is best here.
I agree with all of Craig's points about the advantages of
packet-level compression, so I'd really prefer to make that approach
work if we can. However, it also seems to me that there's room to be
fairly concerned about what these test results are showing.

BTW ... in view of the nearby thread about permanently disabling
OpenSSL's compression option, I wonder whether we really ought to be
in a hurry to put back compression underneath that.  While this area
is out of my wheelhouse, my understanding of the problem with OpenSSL
is that the packet length changes due to compression can provide
enough information to an attacker to break the encryption.  It seems
to me that do-it-ourselves compression would most likely create an
equivalent vulnerability.

Perhaps that problem can be dodged by refusing to use compression
if any form of encryption is in use ... but that would make for
another big reduction in the scope of usefulness of this feature.
Connections on which you'd care about compression are likely to be
across open networks where encryption is an even more pressing need.

regards, tom lane

It is true that compression can compromise encryption,
but how it can compromise plain text messages?
It certainly doesn't make communication more secure, but definitely
it doesn't make it less secure!

Please also notice that compression is first of all needed for bulk data 
loading and replication.

And in many cases this traffic is internal, so no attacks are expected here.
It is actually one of the arguments why compression at libpq level is 
needed - why we can not rely
on SSL compression. If we speak about internal network, establishing SSL 
connection make no sense

and just adds extra overhead.





Re: libpq compression

2021-02-22 Thread Daniel Gustafsson
> On 22 Feb 2021, at 22:21, Andres Freund  wrote:
> 
> Hi,
> 
> On 2021-02-22 15:44:30 -0500, Tom Lane wrote:
>> Robert Haas  writes:
>>> So at the end of the day I'm not really quite sure what is best here.
>>> I agree with all of Craig's points about the advantages of
>>> packet-level compression, so I'd really prefer to make that approach
>>> work if we can. However, it also seems to me that there's room to be
>>> fairly concerned about what these test results are showing.
>> 
>> BTW ... in view of the nearby thread about permanently disabling
>> OpenSSL's compression option, I wonder whether we really ought to be
>> in a hurry to put back compression underneath that.  While this area
>> is out of my wheelhouse, my understanding of the problem with OpenSSL
>> is that the packet length changes due to compression can provide
>> enough information to an attacker to break the encryption.  It seems
>> to me that do-it-ourselves compression would most likely create an
>> equivalent vulnerability.
> 
> My understanding is that one can infer compressed content only when an
> attacker can control a part of the *same* encrypted & compressed message
> that the desired-to-be-inferred data is in.  That is possible for
> browsers, because in many cases e.g. a cookie containing a session
> identifier will be sent to a server even when the request is issued from
> some other website (e.g. by injecting lots "images" on a page, to get
> around the same origin policy). That other website doesn't normally have
> access to the cookies. But with compression it can make guesses at the
> cookie value (think adding "Cookie: a" somewhere in the request, then
> trying "Cookie: b", ...) and if the length is shorter than in other
> cases the guess was right.
> 
> There's not really something equivalent in the postgres case. The
> attacker needs to be able to influence the content in a lot of repeated
> encrypted packets between client/server that are otherwise unchanged (or
> predictable). It's possible that you can come up with something
> contorted if the attacker controls the data the client requests, but
> I don't think it's a comparable risk.
> 
> Not a cryptographer by any means though.

I'm not a cryptographer either in any shape or form, but the above matches my
understanding of the CRIME and BREACH style of attacks in relation to postgres.

--
Daniel Gustafsson   https://vmware.com/





Re: libpq compression

2021-02-22 Thread Andres Freund
Hi,

On 2021-02-22 15:44:30 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > So at the end of the day I'm not really quite sure what is best here.
> > I agree with all of Craig's points about the advantages of
> > packet-level compression, so I'd really prefer to make that approach
> > work if we can. However, it also seems to me that there's room to be
> > fairly concerned about what these test results are showing.
> 
> BTW ... in view of the nearby thread about permanently disabling
> OpenSSL's compression option, I wonder whether we really ought to be
> in a hurry to put back compression underneath that.  While this area
> is out of my wheelhouse, my understanding of the problem with OpenSSL
> is that the packet length changes due to compression can provide
> enough information to an attacker to break the encryption.  It seems
> to me that do-it-ourselves compression would most likely create an
> equivalent vulnerability.

My understanding is that one can infer compressed content only when an
attacker can control a part of the *same* encrypted & compressed message
that the desired-to-be-inferred data is in.  That is possible for
browsers, because in many cases e.g. a cookie containing a session
identifier will be sent to a server even when the request is issued from
some other website (e.g. by injecting lots "images" on a page, to get
around the same origin policy). That other website doesn't normally have
access to the cookies. But with compression it can make guesses at the
cookie value (think adding "Cookie: a" somewhere in the request, then
trying "Cookie: b", ...) and if the length is shorter than in other
cases the guess was right.

There's not really something equivalent in the postgres case. The
attacker needs to be able to influence the content in a lot of repeated
encrypted packets between client/server that are otherwise unchanged (or
predictable). It's possible that you can come up with something
contorted if the attacker controls the data the client requests, but
I don't think it's a comparable risk.

Not a cryptographer by any means though.


I think the majority of the cases where one would want to use libpq
compression are pretty much unaffected by such attacks, even on a
theoretical level. And for the remainder I suspect that we'd already
be vulnerable, due to to wal_compression.

Greetings,

Andres Freund




Re: libpq compression

2021-02-22 Thread Tom Lane
Robert Haas  writes:
> So at the end of the day I'm not really quite sure what is best here.
> I agree with all of Craig's points about the advantages of
> packet-level compression, so I'd really prefer to make that approach
> work if we can. However, it also seems to me that there's room to be
> fairly concerned about what these test results are showing.

BTW ... in view of the nearby thread about permanently disabling
OpenSSL's compression option, I wonder whether we really ought to be
in a hurry to put back compression underneath that.  While this area
is out of my wheelhouse, my understanding of the problem with OpenSSL
is that the packet length changes due to compression can provide
enough information to an attacker to break the encryption.  It seems
to me that do-it-ourselves compression would most likely create an
equivalent vulnerability.

Perhaps that problem can be dodged by refusing to use compression
if any form of encryption is in use ... but that would make for
another big reduction in the scope of usefulness of this feature.
Connections on which you'd care about compression are likely to be
across open networks where encryption is an even more pressing need.

regards, tom lane




Re: libpq compression

2021-02-22 Thread Andres Freund
Hi,

On 2021-02-22 14:48:25 -0500, Robert Haas wrote:
> So, if I read these results correctly, on the "pg_restore of IMDB
> database" test, we get 88% of the RX bytes reduction and 99.8% of the
> TX bytes reduction for 90% of the CPU cost. On the "pgbench" test,
> which probably has much smaller packets, chunked compression gives us
> no bandwidth reduction and in fact consumes slightly more network
> bandwidth -- which seems like it has to be an implementation defect,
> since we should always be able to fall back to sending the
> uncompressed packet if the compressed one is larger, or will be after
> adding the wrapper overhead. But with the current code, at least, we
> pay about a 30% CPU tax, and there's no improvement. The permanent
> compression imposes a whopping 90% CPU tax, but we save about 33% on
> TX bytes and about 14% on RX bytes.

It'd be good to fix the bandwidth increase issue, of course. But other
than that I'm not really bothered by transactional workloads like
pgbench not saving much / increasing overhead (within reason) compared
to bulkier operations. With packets as small as the default pgbench
workloads use, it's hard to use generic compression methods and save
space. While we could improve upon that even in the packet oriented
case, it doesn't seem like an important use case to me.

For pgbench like workloads the main network level issue is latency - and
compression won't help with that (more likely to hurt even). If you
instead pipeline queries, compression can of course help significantly -
but then you'd likely get compression benefits in the chunked case,
right?

IMO the by *far* most important use for fe/be compression is compressing
the WAL stream, so it'd probably be good to analyze the performance of
that while running a few different workloads (perphaps just pgbench
initialization, and r/w workload).


I don't think we're planning to turn compression on by default - it's so
use-case dependent whether network bandwidth or CPU is the scarce
resource - so I think causing *some* unhelpful overhead isn't
prohibitive. It'd be good to improve, but I'd also ok with deferring
that.


> But there's a subtler way in which the permanent compression approach
> could be winning, which is that the compressor can retain state over
> long time periods. In a single pgbench response, there's doubtless
> some opportunity for the compressor to find savings, but an individual
> response doesn't likely include all that much duplication. But just
> think about how much duplication there is from one response to the
> next. The entire RowDescription message is going to be exactly the
> same for every query. If you can represent that in just a couple of
> bytes, it think that figures to be a pretty big win. If I had to
> guess, that's likely why the permanent compression approach seems to
> deliver a significant bandwidth savings even on the pgbench test,
> while the chunked approach doesn't. Likewise in the other direction:
> the query doesn't necessarily contain a lot of internal duplication,
> but it duplicate the previous query to a very large extent. It would
> be interesting to know whether this theory is correct, and whether
> anyone can spot a flaw in my reasoning.

I would assume this is correct.


> If it is, that doesn't necessarily mean we can't use the chunked
> approach, but it certainly makes it less appealing. I can see two ways
> to go. One would be to just accept that it won't get much benefit in
> cases like the pgbench example, and mitigate the downsides as well as
> we can. A version of this patch that caused a 3% CPU overhead in cases
> where it can't compress would be far more appealing than one that
> causes a 30% overhead in such cases (which seems to be where we are
> now).

I personally don't think it's worth caring about pgbench. But there's
obviously also other cases where "stateful" compression could be
better...

Greetings,

Andres Freund




Re: libpq compression

2021-02-22 Thread Robert Haas
On Thu, Feb 11, 2021 at 8:09 AM Daniil Zakhlystov
 wrote:
> [ benchmark results ]

So, if I read these results correctly, on the "pg_restore of IMDB
database" test, we get 88% of the RX bytes reduction and 99.8% of the
TX bytes reduction for 90% of the CPU cost. On the "pgbench" test,
which probably has much smaller packets, chunked compression gives us
no bandwidth reduction and in fact consumes slightly more network
bandwidth -- which seems like it has to be an implementation defect,
since we should always be able to fall back to sending the
uncompressed packet if the compressed one is larger, or will be after
adding the wrapper overhead. But with the current code, at least, we
pay about a 30% CPU tax, and there's no improvement. The permanent
compression imposes a whopping 90% CPU tax, but we save about 33% on
TX bytes and about 14% on RX bytes.

If that's an accurate reading of the results, then I would say the
"pg_restore of IMDB database" test is pretty much a wash. Some people
might prefer to incur the extra CPU cost to get the extra bandwidth
savings, and other people might not. But neither group of people
really has cause for complaint if the other approach is selected,
because the costs and benefits are similar in both cases. But in the
pgbench test case, the chunked compression looks horrible. Sure, it's
less costly from a CPU perspective than the other approach, but since
you don't get any benefit, you'd be far better off disabling
compression altogether than using the chunked approach.

However, I feel like some of this has almost got to be an
implementation deficiency in the "chunked" version of the patch. Now,
I haven't looked at that patch. But, there are certainly a number of
things that it might be failing to do that could make a big
difference:

1. As I mentioned above, we need to fall back to sending the
uncompressed message if compression fails to reduce the size, or if it
doesn't reduce the size by enough to compensate for the header we have
to add to the packet (I assume this is 5 bytes, perhaps 6 if you allow
a byte to mention the compression type).

2. Refining this further, if we notice that we are failing to compress
messages regularly, maybe we should adaptively give up. The simplest
idea would be something like: keep track of what percentage of the
time compression succeeds in reducing the message size. If in the last
100 attempts we got a benefit fewer than 75 times, then conclude the
data isn't very compressible and switch to only attempting to compress
every twentieth packet or so. If the data changes and becomes more
compressible again the statistics will eventually tilt back in favor
of compressing every packet again; if not, we'll only be paying 5% of
the overhead.

3. There should be some minimum size before we attempt compression.
pglz gives up right away if the input is less than 32 bytes; I don't
know if that's the right limit, but presumably it'd be very difficult
to save 5 or 6 bytes out of a message smaller than that, and maybe
it's not worth trying even for slightly larger messages.

4. It might be important to compress multiple packets at a time. I can
even imagine having two different compressed protocol messages, one
saying 'here is a compressed messages' and the other saying 'here are
a bunch of compressed messages rolled up into one packet'.

But there's a subtler way in which the permanent compression approach
could be winning, which is that the compressor can retain state over
long time periods. In a single pgbench response, there's doubtless
some opportunity for the compressor to find savings, but an individual
response doesn't likely include all that much duplication. But just
think about how much duplication there is from one response to the
next. The entire RowDescription message is going to be exactly the
same for every query. If you can represent that in just a couple of
bytes, it think that figures to be a pretty big win. If I had to
guess, that's likely why the permanent compression approach seems to
deliver a significant bandwidth savings even on the pgbench test,
while the chunked approach doesn't. Likewise in the other direction:
the query doesn't necessarily contain a lot of internal duplication,
but it duplicate the previous query to a very large extent. It would
be interesting to know whether this theory is correct, and whether
anyone can spot a flaw in my reasoning.

If it is, that doesn't necessarily mean we can't use the chunked
approach, but it certainly makes it less appealing. I can see two ways
to go. One would be to just accept that it won't get much benefit in
cases like the pgbench example, and mitigate the downsides as well as
we can. A version of this patch that caused a 3% CPU overhead in cases
where it can't compress would be far more appealing than one that
causes a 30% overhead in such cases (which seems to be where we are
now).

Alternatively, we could imagine that the compressed-message packets as
carrying a single 

Re: libpq compression

2021-02-21 Thread Craig Ringer
On Thu, 11 Feb 2021, 21:09 Daniil Zakhlystov, 
wrote::

>
> 3. Chunked compression allows to compress only well compressible messages
> and save the CPU cycles by not compressing the others
> 4. Chunked compression introduces some traffic overhead compared to the
> permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump,
> according to results in my previous message)
> 5. From the protocol point of view, chunked compression seems a little bit
> more flexible:
>  - we can inject some uncompressed messages at any time without the need
> to decompress/compress the compressed data
>  - we can potentially switch the compression algorithm at any time (but I
> think that this might be over-engineering)
>

Chunked compression also potentially makes it easier to handle non blocking
sockets, because you aren't worrying about yet another layer of buffering
within the compression stream. This is a real pain with SSL, for example.

It simplifies protocol analysis.

It permits compression to be decided on the fly, heuristically, based on
message size and potential compressibility.

It could relatively easily be extended to compress a group of pending small
messages, e.g. by PQflush. That'd help mitigate the downsides with small
messages.

So while stream compression offers better compression ratios, I'm inclined
to suspect we'll want message level compression.

>


Re: libpq compression

2021-02-11 Thread Konstantin Knizhnik




On 11.02.2021 16:09, Daniil Zakhlystov wrote:

Hi!


On 09.02.2021 09:06, Konstantin Knizhnik wrote:

Sorry, but my interpretation of your results is completely different:
permanent compression is faster than chunked compression (2m15 vs. 2m27)
and consumes less CPU (44 vs 48 sec).
Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
So permanent compression is better from all points of view: it is
faster, consumes less CPU and reduces network traffic!

 From my point of view your results just prove my original opinion that
possibility to control compression on the fly and use different
compression algorithm for TX/RX data
just complicates implementation and given no significant advantages.

When I mentioned the lower CPU usage, I was referring to the pgbench test 
results in attached
google doc, where chunked compression demonstrated lower CPU usage compared to 
the permanent compression.

I made another (a little bit larger) pgbench test to demonstrate this:

Pgbench test parameters:

Data load
pgbench -i -s 100

Run configuration
pgbench --builtin tpcb-like -t 1500 --jobs=64 --client==600"

Pgbench test results:

No compression
latency average = 247.793 ms
tps = 2421.380067 (including connections establishing)
tps = 2421.660942 (excluding connections establishing)

real6m11.818s
user1m0.620s
sys 2m41.087s
RX bytes diff, human: 703.9221M
TX bytes diff, human: 772.2580M

Chunked compression (compress only CopyData and DataRow messages)
latency average = 249.123 ms
tps = 2408.451724 (including connections establishing)
tps = 2408.719578 (excluding connections establishing)

real6m13.819s
user1m18.800s
sys 2m39.941s
RX bytes diff, human: 707.3872M
TX bytes diff, human: 772.1594M

Permanent compression
latency average = 250.312 ms
tps = 2397.005945 (including connections establishing)
tps = 2397.279338 (excluding connections establishing)

real6m15.657s
user1m54.281s
sys 2m37.187s
RX bytes diff, human: 610.6932M
TX bytes diff, human: 513.2225M


As you can see in the above results, user CPU time (1m18.800s vs 1m54.281s) is 
significantly smaller in
chunked compression because it doesn’t try to compress all of the packets.
Well, but permanent compression provides some (not so large) reducing of 
traffic, while
for chunked compression network traffic is almost the same as with 
no-compression, but it consumes more CPU.


Definitely pgbench queries are not the case where compression should be 
used: both requests and responses are too short to make compression 
efficient.

So in this case compression should not be used at all.
From my point of view, "chunked compression" is not a good compromise 
between no-compression and permanent-compression cases,
but it combines drawbacks of two approaches: doesn't reduce traffic but 
consume more CPU.


Here is the summary from my POV, according to these and previous tests results:

1. Permanent compression always brings the highest compression ratio
2. Permanent compression might be not worthwhile in load different from COPY 
data / Replication / BLOBs/JSON queries
3. Chunked compression allows to compress only well compressible messages and 
save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared to the 
permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, 
according to results in my previous message)
5. From the protocol point of view, chunked compression seems a little bit more 
flexible:
  - we can inject some uncompressed messages at any time without the need to 
decompress/compress the compressed data
  - we can potentially switch the compression algorithm at any time (but I 
think that this might be over-engineering)

Given the summary above, I think it’s time to make a decision on which path we 
should take and make the final list of goals that need to be reached in this 
patch to make it committable.

Thanks,

Daniil Zakhlystov






Re: libpq compression

2021-02-11 Thread Daniil Zakhlystov
Hi!

> On 09.02.2021 09:06, Konstantin Knizhnik wrote:
> 
> Sorry, but my interpretation of your results is completely different:
> permanent compression is faster than chunked compression (2m15 vs. 2m27) 
> and consumes less CPU (44 vs 48 sec).
> Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
> So permanent compression is better from all points of view: it is 
> faster, consumes less CPU and reduces network traffic!
> 
> From my point of view your results just prove my original opinion that 
> possibility to control compression on the fly and use different 
> compression algorithm for TX/RX data
> just complicates implementation and given no significant advantages.

When I mentioned the lower CPU usage, I was referring to the pgbench test 
results in attached 
google doc, where chunked compression demonstrated lower CPU usage compared to 
the permanent compression.

I made another (a little bit larger) pgbench test to demonstrate this:

Pgbench test parameters:

Data load
pgbench -i -s 100

Run configuration
pgbench --builtin tpcb-like -t 1500 --jobs=64 --client==600"

Pgbench test results:

No compression
latency average = 247.793 ms
tps = 2421.380067 (including connections establishing)
tps = 2421.660942 (excluding connections establishing)

real6m11.818s
user1m0.620s
sys 2m41.087s
RX bytes diff, human: 703.9221M
TX bytes diff, human: 772.2580M

Chunked compression (compress only CopyData and DataRow messages)
latency average = 249.123 ms
tps = 2408.451724 (including connections establishing)
tps = 2408.719578 (excluding connections establishing)

real6m13.819s
user1m18.800s
sys 2m39.941s
RX bytes diff, human: 707.3872M
TX bytes diff, human: 772.1594M

Permanent compression
latency average = 250.312 ms
tps = 2397.005945 (including connections establishing)
tps = 2397.279338 (excluding connections establishing)

real6m15.657s
user1m54.281s
sys 2m37.187s
RX bytes diff, human: 610.6932M
TX bytes diff, human: 513.2225M


As you can see in the above results, user CPU time (1m18.800s vs 1m54.281s) is 
significantly smaller in
chunked compression because it doesn’t try to compress all of the packets. 

Here is the summary from my POV, according to these and previous tests results:

1. Permanent compression always brings the highest compression ratio
2. Permanent compression might be not worthwhile in load different from COPY 
data / Replication / BLOBs/JSON queries
3. Chunked compression allows to compress only well compressible messages and 
save the CPU cycles by not compressing the others
4. Chunked compression introduces some traffic overhead compared to the 
permanent (1.2810G vs 1.2761G TX data on pg_restore of IMDB database dump, 
according to results in my previous message)
5. From the protocol point of view, chunked compression seems a little bit more 
flexible:
 - we can inject some uncompressed messages at any time without the need to 
decompress/compress the compressed data
 - we can potentially switch the compression algorithm at any time (but I think 
that this might be over-engineering)

Given the summary above, I think it’s time to make a decision on which path we 
should take and make the final list of goals that need to be reached in this 
patch to make it committable. 

Thanks,

Daniil Zakhlystov




Re: libpq compression

2021-02-09 Thread Konstantin Knizhnik




On 08.02.2021 22:23, Daniil Zakhlystov wrote:

Hi everyone,

I’ve been making some experiments with an on-the-fly compression switch lately 
and have some updates.

...
pg_restore of IMDB database test results

Chunked compression with only CopyData or DataRow compression (second approach):
time:
real2m27.947s
user0m45.453s
sys 0m3.113s
RX bytes diff, human:  1.8837M
TX bytes diff, human:  1.2810G

Permanent compression:
time:
real2m15.810s
user0m42.822s
sys 0m2.022s
RX bytes diff, human:  2.3274M
TX bytes diff, human:  1.2761G

Without compression:
time:
real2m38.245s
user0m18.946s
sys 0m2.443s
RX bytes diff, human:  5.6117M
TX bytes diff, human:  3.8227G


Also, I’ve run pgbench tests and measured the CPU load. Since chunked 
compression did not compress any messages
except for CopyData or DataRow, it demonstrated lower CPU usage compared to the 
permanent compression, full report with graphs
is available in the Google doc above.

Pull request with the second approach implemented:
https://github.com/postgrespro/libpq_compression/pull/7

Also, in this pull request, I’ve made the following changes:
- extracted the general-purpose streaming compression API into the separate 
structure (ZStream) so it can be used in other places without tx_func and 
rx_func,
maybe the other compression patches can utilize it?
- made some refactoring of ZpqStream
- moved the SSL and ZPQ buffered read data checks into separate function 
pqReadPending

What do you think of the results above? I think that the implemented approach 
is viable, but maybe I missed something in my tests.


Sorry, but my interpretation of your results is completely different:
permanent compression is faster than chunked compression (2m15 vs. 2m27) 
and consumes less CPU (44 vs 48 sec).

Size of RX data is slightly larger - 0.5Mb but TX size is smaller - 5Mb.
So permanent compression is better from all points of view: it is 
faster, consumes less CPU and reduces network traffic!


From my point of view your results just prove my original opinion that 
possibility to control compression on the fly and use different 
compression algorithm for TX/RX data

just complicates implementation and given no significant advantages.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: libpq compression

2021-02-08 Thread Daniil Zakhlystov
Hi everyone, 

I’ve been making some experiments with an on-the-fly compression switch lately 
and have some updates.


> On Dec 22, 2020, at 10:42 PM, Robert Haas  wrote:
> 
> 
> Hmm, I assumed that if the compression buffers were flushed on the
> sending side, and if all the data produced on the sending side were
> transmitted to the receiver, the receiving side would then return
> everything up to the point of the flush. However, now that I think
> about it, there's no guarantee that any particular compression library
> would actually behave that way. I wonder what actually happens in
> practice with the libraries we care about?

> I'm not sure about the details, but the general idea seems like it
> might be worth considering. If we choose a compression method that is
> intended for streaming compression and decompression and whose library
> handles compression flushes sensibly, then we might not really need to
> go this way to make it work. But, on the other hand, this method has a
> certain elegance that just compressing everything lacks, and might
> allow some useful flexibility. On the third hand, restarting
> compression for every new set of messages might really hurt the
> compression ratio in some scenarios. I'm not sure what is best.

Earlier in the thread, we’ve discussed introducing a new message type 
(CompressedMessage) 
so I came up with the two main approaches to send compressed data:

1. Sending the compressed message type without the message length followed by 
continuous compressed data.
2. Sending the compressed data packed into messages with specified length 
(pretty much like CopyData).

The first approach allows sending raw compressed data without any additional 
framing, but has some downsides:
- to determine the end of compressed data, it is required to decompress the 
entire compressed data 
- in most cases (at least in ZSTD and ZLIB), it is required to end the 
compression stream so the decompressor 
can determine the end of compressed data on the receiving side.  After that, it 
is required to init a new compression 
context (for example, in case of ZSTD, start a new frame) which may lead to a 
worse compression ratio.

The second approach has some overhead because it requires framing the 
compressed data into messages with specified length (chunks), 
but I see the following advantages:
- CompressedMessage is being sent like any other Postgres protocol message and 
we always know the size of
compressed data from the message header so it is not required to actually 
decompress the data to determine the end of 
compressed data
- This approach does not require resetting the compression context so 
compression can continue even if there are some 
uncompressed messages between two CompressedMessage messages

So I’ve implemented the second approach with the following message compression 
criteria: 
if the message type is CopyData or DataRow, it should be compressed, otherwise, 
send the message uncompressed

I’ve compared this approach with permanent compression in the following 
scenarios:
- pg_restore of IMDB database 
(https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/2QYZBT)
- pgbench "host=x dbname=testdb port=5432 user=testuser compression=zstd:1" 
--builtin tpcb-like -t 400 --jobs=64 --client=600

The detailed report with CPU/memory/network load is available here:
https://docs.google.com/document/d/13qEUpIjh2NNOOW_8NZOFUohRSEIro0R2xdVs-2Ug8Ts

pg_restore of IMDB database test results

Chunked compression with only CopyData or DataRow compression (second approach):
time:
real2m27.947s
user0m45.453s
sys 0m3.113s
RX bytes diff, human:  1.8837M
TX bytes diff, human:  1.2810G

Permanent compression:
time:
real2m15.810s
user0m42.822s
sys 0m2.022s
RX bytes diff, human:  2.3274M
TX bytes diff, human:  1.2761G

Without compression:
time:
real2m38.245s
user0m18.946s
sys 0m2.443s
RX bytes diff, human:  5.6117M
TX bytes diff, human:  3.8227G


Also, I’ve run pgbench tests and measured the CPU load. Since chunked 
compression did not compress any messages 
except for CopyData or DataRow, it demonstrated lower CPU usage compared to the 
permanent compression, full report with graphs
is available in the Google doc above.

Pull request with the second approach implemented:
https://github.com/postgrespro/libpq_compression/pull/7

Also, in this pull request, I’ve made the following changes:
- extracted the general-purpose streaming compression API into the separate 
structure (ZStream) so it can be used in other places without tx_func and 
rx_func,
maybe the other compression patches can utilize it?
- made some refactoring of ZpqStream
- moved the SSL and ZPQ buffered read data checks into separate function 
pqReadPending

What do you think of the results above? I think that the implemented approach 
is viable, but maybe I missed something in my tests.
Maybe we can choose the other compression criteria (for example, compress only 

Re: libpq compression

2021-01-12 Thread Andrey Borodin



> 12 янв. 2021 г., в 20:47, Konstantin Knizhnik  
> написал(а):
> 
>> I think we should come up with an minimal, prelimininary 0001 patch which is
>> common between the 3 compression patches (or at least the two using zstd).  
>> The
>> ./configure changes and a compressionlibs struct would also be included.  I'm
>> planning to do something like this with the next revision of my patchset.
>> 
> It will be great to have such common preliminary patch including zstd support.

+1. I'd also rebase my WAL FPI patch on top of common code.

Best regards, Andrey Borodin.



Re: libpq compression

2021-01-12 Thread Konstantin Knizhnik




On 12.01.2021 18:38, Justin Pryzby wrote:

On Tue, Jan 12, 2021 at 08:44:43AM +0300, Konstantin Knizhnik wrote:

On 11.01.2021 20:38, Tomas Vondra wrote:

1) Fixes the MSVC makefile. The list of files is sorted alphabetically,
so I've added the file at the end.

Thank you

This is still failing the windows build.

I think you need something like this, which I have in my zstd/pg_dump patch.

--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -307,6 +307,7 @@ sub GenerateFiles
 HAVE_LIBXML2=> undef,
 HAVE_LIBXSLT=> undef,
 HAVE_LIBZ   => $self->{options}->{zlib} ? 1 : 
undef,
+   HAVE_LIBZSTD=> $self->{options}->{zstd} ? 1 : 
undef,



Thank you.


I think we should come up with an minimal, prelimininary 0001 patch which is
common between the 3 compression patches (or at least the two using zstd).  The
./configure changes and a compressionlibs struct would also be included.  I'm
planning to do something like this with the next revision of my patchset.

It will be great to have such common preliminary patch including zstd 
support.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





  1   2   3   >