Re: libpq compression (part 3)

Jacob Champion Tue, 21 May 2024 11:24:25 -0700

On Tue, May 21, 2024 at 8:23 AM Jacob Burroughs
<[email protected]> wrote:
> As currently implemented, the compression only applies to
> CopyData/DataRow/Query messages, none of which should be involved in
> authentication, unless I've really missed something in my
> understanding.


Right, but Robert has argued that we should compress it all, and I'm
responding to that proposal.

Sorry for introducing threads within threads. But I think it's
valuable to pin down both 1) the desired behavior, and 2) how the
current proposal behaves, as two separate things. I'll try to do a
better job of communicating which I'm talking about.

> > Right, I think it's reasonable to let a sufficiently
> > determined/informed user lift the guardrails, but first we have to
> > choose to put guardrails in place... and then we have to somehow
> > sufficiently inform the users when it's okay to lift them.
>
> My thought would be that compression should be opt-in on the client
> side, with documentation around the potential security pitfalls. (I
> could be convinced it should be opt-in on the server side, but overall
> I think opt-in on the client side generally protects against footguns
> without excessively getting in the way

We absolutely have to document the risks and allow clients to be
written safely. But I think server-side controls on risky behavior
have proven to be generally more valuable, because the server
administrator is often in a better spot to see the overall risks to
the system. ("No, you will not use deprecated ciphersuites. No, you
will not access this URL over plaintext. No, I will not compress this
response containing customer credit card numbers, no matter how nicely
you ask.") There are many more clients than servers, so it's less
risky for the server to enforce safety than to hope that every client
is safe.

Does your database and access pattern regularly mingle secrets with
public data? Would auditing correct client use of compression be a
logistical nightmare? Do your app developers keep indicating in
conversations that they don't understand the risks at all? Cool, just
set `encrypted_compression = nope_nope_nope` on the server and sleep
soundly at night. (Ideally we would default to that.)

> and if an attacker controls the
> client, they can just get the information they want directly-they
> don't need compression sidechannels to get that information.)

Sure, but I don't think that's relevant to the threats being discussed.

> Within SQL-level things, I don't think we can reasonably differentiate
> between private and attacker-controlled information at the
> libpq/server level.

And by the IETF line of argument -- or at least the argument I quoted
above -- that implies that we really have no business introducing
compression when confidentiality is requested. A stronger approach
would require us to prove, or the user to indicate, safety before
compressing.

Take a look at the security notes for QPACK [1] -- keeping in mind
that they know _more_ about what's going on at the protocol level than
we do, due to the header design. And they still say things like "an
encoder might choose not to index values with low entropy" and "these
criteria ... will evolve over time as new attacks are discovered." A
huge amount is left as an exercise for the reader. This stuff is
really hard.

> We can reasonably differentiate between message
> types that *definitely* are private and ones that could have
> either/both data in them, but that's not nearly as useful.  I think
> not compressing auth-related packets plus giving a mechanism to reset
> the compression stream for clients (plus guidance on the tradeoffs
> involved in turning on compression) is about as good as we can get.

The concept of stream reset seems necessary but insufficient at the
application level, which bleeds over into Jelte's compression_restart
proposal. (At the protocol level, I think it may be sufficient?)

If I write a query where one of the WHERE clauses is
attacker-controlled and the other is a secret, I would really like to
not compress that query on the client side. If I join a table of user
IDs against a table of user-provided addresses and a table of
application tokens for that user, compressing even a single row leaks
information about those tokens -- at a _very_ granular level -- and I
would really like the server not to do that.

So if I'm building sand castles... I think maybe it'd be nice to mark
tables (and/or individual columns?) as safe for compression under
encryption, whether by row or in aggregate. And maybe libpq and psql
should be able to turn outgoing compression on and off at will.

And I understand those would balloon the scope of the feature. I'm
worried I'm doing the security-person thing and sucking all the air
out of the room. I know not everybody uses transport encryption; for
those people, compress-it-all is probably a pretty winning strategy,
and there's no need to reset the compression context ever. And the
pg_dump-style, "give me everything" use case seems like it could maybe
be okay, but I really don't know how to assess the risk there, at all.

> That said, I *think* the feature is reasonable to be
> reviewed/committed without the reset functionality as long as the
> compressed data already has the mechanism built in (as it does) to
> signal when a decompressor should restart its streaming.  The actual
> signaling protocol mechanism/necessary libpq API can happen in
> followon work.

Well... working out the security minutiae _after_ changing the
protocol is not historically a winning strategy, I think. Better to do
it as a vertical stack.

Thanks,
--Jacob

[1] https://www.rfc-editor.org/rfc/rfc9204.html#name-security-considerations

Re: libpq compression (part 3)

Reply via email to