Re: sandboxing untrusted code

2023-08-31 Thread Jeff Davis
On Thu, 2023-08-31 at 11:25 -0400, Robert Haas wrote:
> As a refresher, the scenario I'm talking about is any one in which
> one
> user, who I'll call Bob, does something that results in executing
> code
> provided by another user, who I'll call Alice. The most obvious way
> that this can happen is if Bob performs some operation that targets a
> table owned by Alice. That operation might be DML, like an INSERT or
> UPDATE; or it might be some other kind of maintenance command that
> can
> cause code execution, like REINDEX, which can evaluate index
> expressions.

REINDEX executes index expressions as the table owner. (You are correct
that INSERT executes index expressions as the inserting user.)

>  The code being executed might be run either as Alice or
> as Bob, depending on how it's been attached to the table and what
> operation is being performed and maybe whether some function or
> procedure that might contain it is SECURITY INVOKER or SECURITY
> DEFINER. Regardless of the details, our concern is that Alice's code
> might do something that Bob does not like. This is a particularly
> lively concern if the code happens to be running with the privileges
> of Bob, because then Alice might try to do something like access
> objects for which Bob has permissions and Alice does not.

Agreed.


> 1. Compute stuff. There's no restriction on the permissible amount of
> compute; if you call untrusted code, nothing prevents it from running
> forever.
> 2. Call other code. This may be done by a function call or a command
> such as CALL or DO, all subject to the usual permissions checks but
> no
> further restrictions.
> 3. Access the current session state, without modifying it. For
> example, executing SHOW or current_setting() is fine.
> 4. Transiently modify the current session state in ways that are
> necessarily reversed before returning to the caller. For example, an
> EXCEPTION block or a configuration change driven by proconfig is
> fine.
> 5. Produce messages at any log level. This includes any kind of
> ERROR.

Nothing in that list really exercises privileges (except #2?). If those
are the allowed set of things a sandboxed function can do, is a
sandboxed function equivalent to a function running with no privileges
at all?

Please explain #2 in a bit more detail. Whose EXECUTE privileges would
be used (I assume it depende on SECURITY DEFINER/INVOKER)? Would the
called code also be sandboxed?

> In general if we have a great big call stack that involves calling a
> whole bunch of functions either as SECURITY INVOKER or as SECURITY
> DEFINER, changing the session state is blocked unless the session
> user
> trusts the owners of all of those functions.

That clarifies the earlier mechanics you described, thank you.

>  And if we got to any of
> those functions by means of code attached directly to tables, like an
> index expression or default expression, changing the session state is
> blocked unless the session user also trusts the owners of those
> tables.
> 
> I see a few obvious objections to this line of attack that someone
> might raise, and I'd like to address them now. First, somebody might
> argue that this is too hard to implement.

That seems to be a response to my question above: "Isn't that a hard
problem; maybe impossible?".

Let me qualify that: if the function is written by Alice, and the code
is able to really exercise the privileges of the caller (Bob), then it
seems really hard to make it safe for the caller.

If the function is sandboxed such that it's not really using Bob's
privileges (it's just nominally running as Bob) that's a much more
tractable problem.

I believe there's some nuance to your proposal where some of Bob's
privileges could be used safely, but I'm not clear on exactly which
ones. The difficulty of the implementation would depend on these
details.

> Second, somebody might argue that full sandboxing is such a
> draconian set of restrictions that it will inconvenience users
> greatly
> or that it's pointless to even allow anything to be executed or
> something along those lines. I think that argument has some merit,
> but
> I think the restrictions sound worse than they actually are in
> context.

+100. We should make typical cases easy to secure.

> Even if they do something as
> simple as reading from another table, that's not necessarily going to
> dump and restore properly, even if it's secure, because the table
> ordering dependencies won't be clear to pg_dump.

A good point. A lot of these extraordinary cases are either incredibly
fragile or already broken.

> What if such a function wants to ALTER ROLE ...
> SUPERUSER? I think that's bonkers and should almost certainly be
> categorically denied.

...also agreed, a lot of these extraordinary cases are really just
surface area for attack with no legitimate use case.




One complaint (not an objection, because I don't think we have
the luxury of objecting to viable proposals when it comes to improving
our security mo

Re: sandboxing untrusted code

2023-09-01 Thread Robert Haas
On Thu, Aug 31, 2023 at 8:57 PM Jeff Davis  wrote:
> > As a refresher, the scenario I'm talking about is any one in which
> > one
> > user, who I'll call Bob, does something that results in executing
> > code
> > provided by another user, who I'll call Alice. The most obvious way
> > that this can happen is if Bob performs some operation that targets a
> > table owned by Alice. That operation might be DML, like an INSERT or
> > UPDATE; or it might be some other kind of maintenance command that
> > can
> > cause code execution, like REINDEX, which can evaluate index
> > expressions.
>
> REINDEX executes index expressions as the table owner. (You are correct
> that INSERT executes index expressions as the inserting user.)

I was speaking here of who provided the code, rather than whose
credentials were used to execute it. The index expressions are
provided by the table owner no matter who evaluates them in a
particular case.

> > 1. Compute stuff. There's no restriction on the permissible amount of
> > compute; if you call untrusted code, nothing prevents it from running
> > forever.
> > 2. Call other code. This may be done by a function call or a command
> > such as CALL or DO, all subject to the usual permissions checks but
> > no
> > further restrictions.
> > 3. Access the current session state, without modifying it. For
> > example, executing SHOW or current_setting() is fine.
> > 4. Transiently modify the current session state in ways that are
> > necessarily reversed before returning to the caller. For example, an
> > EXCEPTION block or a configuration change driven by proconfig is
> > fine.
> > 5. Produce messages at any log level. This includes any kind of
> > ERROR.
>
> Nothing in that list really exercises privileges (except #2?). If those
> are the allowed set of things a sandboxed function can do, is a
> sandboxed function equivalent to a function running with no privileges
> at all?

Close but not quite. As you say, #2 does exercise privileges. Also,
even if no privileges are exercised, you could still refer to
CURRENT_ROLE, and I think you could also call a function like
has_table_privilege.  Your identity hasn't changed, but you're
restricted from exercising some of your privileges. Really, you still
have them, but they're just not available to you in that situation.

> Please explain #2 in a bit more detail. Whose EXECUTE privileges would
> be used (I assume it depende on SECURITY DEFINER/INVOKER)? Would the
> called code also be sandboxed?

Nothing in this proposed system has any impact on whose privileges are
used in any particular context, so any privilege checks conducted
pursuant to #2 are performed as the same user who would perform them
today. Whether the called code would be sandboxed depends on how the
rules I articulated in the previous email would apply to it. Since
those rules depend on the user IDs, if the called code is owned by the
same user as the calling code and is SECURITY INVOKER, then those
rules apply in the same way and the same level of sandboxing will
apply. But if the called function is owned by a different user or is
SECURITY DEFINER, then the rules might apply differently to the called
code than the calling code. It's possible this isn't quite good enough
and that some adjustments to the rules are necessary; I'm not sure.

> Let me qualify that: if the function is written by Alice, and the code
> is able to really exercise the privileges of the caller (Bob), then it
> seems really hard to make it safe for the caller.
>
> If the function is sandboxed such that it's not really using Bob's
> privileges (it's just nominally running as Bob) that's a much more
> tractable problem.

Agreed.

> One complaint (not an objection, because I don't think we have
> the luxury of objecting to viable proposals when it comes to improving
> our security model):
>
> Although your proposal sounds like a good security backstop, it feels
> like it's missing the point that there are different _kinds_ of
> functions. We already have the IMMUTABLE marker and we already have
> runtime checks to make sure that immutable functions can't CREATE
> TABLE; why not build on that mechanism or create new markers?

I haven't ruled that out completely, but there's some subtlety here
that doesn't exist in those other cases. If the owner of a function
marks it wrongly in terms of volatility or parallel safety, then they
might make queries run more slowly than they should, or they might
make queries return wrong answers, or error out, or even end up with
messed-up indexes. But none of that threatens the stability of the
system in any very deep way, or the security of the system. It's no
different than putting a CHECK (false) constraint on a table, or
something like that: it might make the system not work, and if that
happens, then you can fix it. Here, however, we can't trust the owners
of functions to label those functions accurately. It won't do for
Alice to create a function and then apply the NICE_AND

Re: sandboxing untrusted code

2023-09-01 Thread Jeff Davis
On Fri, 2023-09-01 at 09:12 -0400, Robert Haas wrote:
> Close but not quite. As you say, #2 does exercise privileges. Also,
> even if no privileges are exercised, you could still refer to
> CURRENT_ROLE, and I think you could also call a function like
> has_table_privilege.  Your identity hasn't changed, but you're
> restricted from exercising some of your privileges. Really, you still
> have them, but they're just not available to you in that situation.

Which privileges are available in a sandboxed environment, exactly? Is
it kind of like masking away all privileges except EXECUTE, or are
other privileges available, like SELECT?

And the distinction that you are drawing between having the privileges
but them (mostly) not being available, versus not having the privileges
at all, is fairly subtle. Some examples showing why that distinction is
important would be helpful.

> 
> > Although your proposal sounds like a good security backstop, it
> > feels
> > like it's missing the point that there are different _kinds_ of
> > functions. We already have the IMMUTABLE marker and we already have
> > runtime checks to make sure that immutable functions can't CREATE
> > TABLE; why not build on that mechanism or create new markers?

...

> Here, however, we can't trust the owners
> of functions to label those functions accurately.

Of course, but observe:

  =# CREATE FUNCTION f(i INT) RETURNS INT IMMUTABLE LANGUAGE plpgsql AS
  $$
  BEGIN
CREATE TABLE x(t TEXT);
RETURN 42 + i;
  END;
  $$;

  =# SELECT f(2);
  ERROR:  CREATE TABLE is not allowed in a non-volatile function
  CONTEXT:  SQL statement "CREATE TABLE x(t TEXT)"
  PL/pgSQL function f(integer) line 3 at SQL statement

The function f() is called at the top level, not as part of any index
expression or other special context. But it fails to CREATE TABLE
simply because that's not an allowed thing for an IMMUTABLE function to
do. That tells me right away that my function isn't going to work, and
I can rewrite it rather than waiting for some other user to say that it
failed when run in a sandbox.

>  It won't do for
> Alice to create a function and then apply the NICE_AND_SAFE marker to
> it.

You can if you always execute NICE_AND_SAFE functions in a sandbox. The
difference is that it's always executed in a sandbox, rather than
sometimes, so it will fail consistently.

> Now, in the case of a C function, things are a bit different. We
> can't
> inspect the generated machine code and know what the function does,
> because of that pesky halting problem. We could handle that either
> through function labeling, since only superusers can create C
> functions, or by putting checks directly in the C code. I was
> somewhat
> inclined toward the latter approach, but I'm not completely sure yet
> what makes sense. Thinking about your comments here made me realize
> that there are other procedural languages to worry about, too, like
> PL/python or PL/perl or PL/sh. Whatever we do for the C functions
> will
> have to be extended to those cases somehow as well. If we label
> functions, then we'll have to allow superusers only to label
> functions
> in these languages as well and make the default label "this is
> unsafe." If we put checks in the C code then I guess any given PL
> needs to certify that it knows about sandboxing or have all of its
> functions treated as unsafe. I think doing this at the C level would
> be better, strictly speaking, because it's more granular. Imagine a
> function that only conditionally does some prohibited action - it can
> be allowed to work in the cases where it does not attempt the
> prohibited operation, and blocked when it does. Labeling is
> all-or-nothing.

Here I'm getting a little lost in what you mean by "prohibited
operation". Most languages mostly use SPI, and whatever sandboxing
checks you do should work there, too. Are you talking about completely
separate side effects like writing files or opening sockets?

Regards,
Jeff Davis





Re: sandboxing untrusted code

2023-09-05 Thread Robert Haas
On Fri, Sep 1, 2023 at 5:27 PM Jeff Davis  wrote:
> Which privileges are available in a sandboxed environment, exactly? Is
> it kind of like masking away all privileges except EXECUTE, or are
> other privileges available, like SELECT?

I think I've more or less answered this already -- fully sandboxed
code can't make reference to external data sources, from which it
follows that it can't exercise SELECT (and most other privileges).

> And the distinction that you are drawing between having the privileges
> but them (mostly) not being available, versus not having the privileges
> at all, is fairly subtle. Some examples showing why that distinction is
> important would be helpful.

I view it like this: when Bob tries to insert or update or delete
Alice's table, and Alice has some code attached to it, Alice is
effectively asking Bob to execute that code with his own privileges.
In general, I think we can reasonably expect that Bob WILL be willing
to do this: if he didn't want to modify into Alice's table, he
wouldn't have executed a DML statement against it, and executing the
code that Alice has attached to that table is a precondition of being
allowed to perform that modification. It's Alice's table and she gets
to set the rules. However, Bob is also allowed to protect himself. If
he's running Alice's code and it wants to do something with which Bob
isn't comfortable, he can change his mind and refuse to execute it
after all.

I always find it helpful to consider real world examples with similar
characteristics. Let's say that Bob is renting a VRBO from Alice.
Alice leaves behind, in the VRBO, a set of rules which Bob must follow
as a condition of being allowed to rent the VRBO. Those rules include
things that Bob but must do at checkout time, like washing all of his
dishes. As a matter of routine, Bob will follow Alice's checkout
instructions. But if Alice includes in the checkout instructions
"Leave your driver's license and social security card on the dining
room table after checkout, plus a record of all of your bank account
numbers," the security systems in Bob's brain should activate and
prevent those instructions from getting followed.

A major difference between that situation (a short term rental of
someone else's house) and the in-database case (a DML statement
against someone else's table) is that when Bob is following Alice's
VRBO checkout instructions, he knows exactly what actions he is
performing. When he executes a DML statement against Alice's table,
Bob the human being does not actually know what Alice's triggers or
index expressions or whatever are causing him to do. As I see it, the
purpose of this system is to prevent Bob from doing things that he
didn't intend to do. He's cool with adding 2 and 2 or concatenating
some strings or whatever, but probably not with reading data and
handing it over to Alice, and definitely not handing all of his
privileges over to Alice. Full sandboxing has to block that kind of
stuff, and it needs to do so precisely because *Bob would not allow
those operations if he knew about them*.

Now, it is not going to be possible to get that perfectly right.
PostgreSQL can not know the state of Bob's human mind, and it cannot
be expected to judge with perfect accuracy what actions Bob would or
would not approve. However, it can make some conservative guesses. If
Bob wants to override those guesses by saying "I trust Alice, do
whatever she says" that's fine. This system attempts to prevent Bob
from accidentally giving away his permissions to an adversary who has
buried malicious code in some unexpected place. But, unlike the
regular permissions system, it is not there to prevent Bob from doing
things that he isn't allowed to do. It's there to prevent Bob from
doing things that he didn't intend to do.

And that's where I see the distinction between *having* permissions
and those permissions being *available* in a particular context. Bob
has permission to give Alice an extra $1000 or whatever if he has the
money and wishes to do so. But those permissions are probably not
*available* in the context where Bob is following a set of
instructions from Alice. If Bob's brain spontaneously generated the
idea "let's give Alice a $1000 tip because her vacation home was
absolutely amazing and I am quite rich," he would probably go right
ahead and act on that idea and that is completely fine. But when Bob
encounters that same idea *on a list of instructions provided by
Alice*, the same operation is blocked *because it came from Alice*. If
the list of instructions from Alice said to sweep the parlor, Bob
would just go ahead and do it. Alice has permission to induce Bob to
sweep the parlor, but does not have permission to induce Bob to give
her a bunch of extra money.

And in the database context, I think it's fine if Alice induces Bob to
compute some values or look at the value of work_mem, but I don't
think it's OK if Alice induces Bob to make her a superuser. Unless Bob
declares t

Re: sandboxing untrusted code

2023-09-05 Thread Jeff Davis
On Tue, 2023-09-05 at 12:25 -0400, Robert Haas wrote:
> I think I've more or less answered this already -- fully sandboxed
> code can't make reference to external data sources, from which it
> follows that it can't exercise SELECT (and most other privileges).

By what principle are we allowing EXECUTE but not SELECT? In theory, at
least, a function could hold secrets in the code, e.g.:

  CREATE FUNCTION answer_to_ultimate_question() RETURNS INT
LANGUAGE plpgsql AS $$ BEGIN RETURN 42; END; $$;

Obviously that's a bad idea in plpgsql, because anyone can just read
pg_proc. And maybe C would be handled differently somehow, so maybe it
all works.

But it feels like something is wrong there: it's fine to execute the
answer_to_ultimate_question() not because Bob has an EXECUTE privilege,
but because the sandbox renders any security concerns with *anyone*
executing the function moot. So why bother checking the EXECUTE
privilege at all?

> And that's where I see the distinction between *having* permissions
> and those permissions being *available* in a particular context. Bob
> has permission to give Alice an extra $1000 or whatever if he has the
> money and wishes to do so. But those permissions are probably not
> *available* in the context where Bob is following a set of
> instructions from Alice. If Bob's brain spontaneously generated the
> idea "let's give Alice a $1000 tip because her vacation home was
> absolutely amazing and I am quite rich," he would probably go right
> ahead and act on that idea and that is completely fine. But when Bob
> encounters that same idea *on a list of instructions provided by
> Alice*, the same operation is blocked *because it came from Alice*.
> If
> the list of instructions from Alice said to sweep the parlor, Bob
> would just go ahead and do it. Alice has permission to induce Bob to
> sweep the parlor, but does not have permission to induce Bob to give
> her a bunch of extra money.

In the real world example, sweeping the parlor has a (slight) cost to
the person doing it and it (slightly) matters who does it. In Postgres,
we don't do any CPU accounting per user, and it's all executed under
the same PID, so it really doesn't matter.

So it raises the question: why would we not simply say that this list
of instructions should be executed by the person who wrote it, in which
case the existing privilege mechanism would work just fine?

> And in the database context, I think it's fine if Alice induces Bob
> to
> compute some values or look at the value of work_mem, but I don't
> think it's OK if Alice induces Bob to make her a superuser.

If all the code can do is compute some values or look at work_mem,
perhaps the function needs no privileges at all (or some minimal
privileges)?

You explained conceptually where you're coming from, but I still don't
see much of a practical difference between having privileges but being
in a context where they won't be used, and dropping the privileges
entirely during that time. I suppose the answer is that the EXECUTE
privilege will still be used, but as I said above, that doesn't
entirely make sense to me, either.

Regards,
Jeff Davis