On Sat, Apr 18, 2026 at 4:04 PM Robert Haas <[email protected]> wrote: > At any rate, it's not time to make a scope decision yet: more research > is needed before committing deeply to any particular course of action. > I'm happy to hear your thoughts but let's reserve judgement until the > research is in.
I've spent a bunch more time working on this and have a few findings and design ideas that I'd like to share. My core design idea is to pass around a Provenances object. Provenance means the ultimate origin of something and can be used, for example when discussing museum objects, to discuss the chain of custody. If for example someone offers our museum what is purported to be the Mona Lisa, we should be able to ask from where they got it, and from where that person got it, and so forth and so on until we get back to someone that everyone agrees once owned the Mona Lisa. If we can't document such a chain of custody, it's probably a fake. The idea here is not an exact analogue, but the concept is similar: it's an attempt to provide an answer to the question "why are we thinking of executing this code?" and the answer is a series of things that happened e.g. the session user X entered a query, and that referenced a view owned by Y, and then that led to calling a function owned by Z, which then led to accessing a table owned by A, which fired a trigger that called a function owned by B, etc. This allows us to reason about whether user X trusts those other users enough to be willing to execute their code, particularly when the execution uses X's permissions. Each Provenances object can store multiple histories with a common prefix. As new code is encountered from new sources, existing histories are extended with new entries to create new and longer histories. For example, in the previous example, when session user X enters a query at top level, we create a Provenances object that contains a single entry which says "this came from the session user". When query rewrite discovers the view owned by Y, an entry is made for it in the same Provenances object, pointing back to the existing entry that led to its discovery. So in this case the entry for the view owned by Y points back to the entry for session user X. In this case, what then happens is that the PlannedStmt ends up with a copy of this Provenances object, and FuncExpr and other nodes in the query carry a "provenance index" by means of which they can say which history explains their presence: either they were present in what the user entered, or they came from the view which was accessed because of what the user entered. At execution time, the PlannedStmt's Provenances object is copied into the EState, which can then discover new provenances during execution. For example, here when the function owned by Z is called, an entry for it will be added to the EState's provenances. Since this function occurred inside the view, it will point back to the existing entry for that view, so the provenances for the function call will be function owned by Z -> view owned by Y -> session user X. Importantly, any queries inside that function are NOT attributed to the session user but just further extend the same history. Now, the good news is that, after spending a bunch of time hacking on this, I really think it can work. The bad news is that it's going to require changing code in LOTS of places. Every place that currently gets an expression tree also needs to get a Provenances object and the executable expression nodes (FuncExpr, OpExpr, etc.) need to be stamped with the correct provenances indexes. There is an exception, which is that in some places we construct Expr trees that we never evaluate but only analyze (e.g. to determine what the return type will be, or to compare two trees to see if they are the same). Such places don't need provenances, but everything that might execute code does, and there are many such places. Query execution is an obvious one, but also query planning -- lots and lots of things feed into eval_const_expressions(). Also various utility commands, like ALTER TABLE, VACUUM, and COPY, and background processes like autovacuum (which runs VACUUM) and logical replication workers (which run code on behalf of a subscription). A completed patch is going to be big, and will be hard to verify, and will complicate back-patching for years to come. One alternative we could consider is something that works more like SECURITY_RESTRICTED_OPERATION: instead of passing provenance information down the call stack, we could try to change state when we enter a dangerous stretch of code and change it back on exit. That could be a valid way forward if, for example, we want to focus on sandboxing by context: flip a bit when you start executing an index expression and flip it back when you finish. Then, you have a way to complain about doing something (directly or indirectly) from inside an index expression that you don't think should be happening there (e.g. ALTER USER). But overall I don't have much confidence in the approach that SECURITY_RESTRICTED_OPERATION takes, for two reasons. First, it's rather vulnerable to bugs of omission. If every function call has to have a provenance chain, you can keep working on the code until they all do, and use things like Assert() to find cases where they don't yet. But if you need to set a "watch out here" flag and then clear it again and you forget, it's just silently not as secure as you wanted it to be. Second, I don't really see how the SECURITY_RESTRICTED_OPERATION approach, which could also be called a run-time stack approach, can deal with the disconnect between planning and execution. For example, let's say that the session user A calls a function owned by B which prepares a query. Later, the session user makes use of that prepared query. Provenances can tell us that the statement we're thinking of executing came from user B, but a run-time flag set while that query was being planned can't tell us anything. Provenances can also tell us which of the things in that query came from what sources, and I see no way to really know that with a run-time stack. It's data that is generated by planning and rewriting which has to somehow be preserved for execution if you want to have it. One thing I haven't yet fully figured out is what the enforcement model looks like. But as an example of a policy that I think is sound, suppose we have a rule that you always refuse to use your privileges to execute DDL unless you trust the entire provenances chain. That rule isn't strong enough to be the only rule, but by itself it blocks a lot of obvious skullduggery. If I say ALTER USER andres SUPERUSER then that's fine; I can do that if I want. But if I run a query which references a view which calls a function which blah blah blah eventually runs that command, it should only be allowed to go through unchecked if either I own all the intermediate objects or I fully trust the users who do. Otherwise I get something conceptually like ERROR: trust violation although I think the actual wording should probably be something more like ERROR: user "andres" may not induce user "rhaas" to execute "ALTER USER", because somebody seeing the former message is likely to assume it's some kind of false positive, whereas somebody seeing the latter message is likely (I think) to be appropriately alarmed -- or if it is a false positive, to understand where things went wrong. I also have some ideas about printing out a DETAIL message that can pinpoint exactly where "andres" is seen to have been involved in triggering the ALTER USER call, which seems important for troubleshooting. All that said, I maintain that knowing the provenances of a query, statement, function call, etc. is very powerful and will let us complain about a very wide range of potentially-dangerous things. The difficulty (to me) seems mostly getting the plumbing fed through everywhere that it needs to go, with the exact enforcement rules being something we can workshop, with lots of different policies being possible once we have the infrastructure. Assorted random but possibly-interesting notes: - The JSON code is quite awkward in terms of being able to thread provenances through everywhere that matters. I have some patches to improve this which also have the nice effect of improving performance and simplifying the code. I haven't quite gotten around to polishing those for posting, but if anyone is interested in adopting them, I would be thrilled. - Type output functions are very widely called and we might want to consider feeding in a "fake" provenances list that just says "yeah, you have to trust this" rather than threading true provenances all the way down to all call sites. This amounts to deciding that all type output functions must be trustworthy, so we need to decide that's OK. Note that we can't assume that type *input* functions are trustworthy, because of domain_in(). - There's also decent number of places where provenance traces through operator classes and operator families. I haven't sorted all of this out yet. It would be defensible to treat these as fully-trusted infrastructure because they can only be owned by superusers or ex-superusers, but the "ex" might be an important caveat. For now, I am assuming that it's a good idea to include these in provenances. Granted, everyone has to trust the superuser, but if the system blocks something, seeing those included in the traceback might be helpful for understanding what happened. For instance, if you say "ORDER BY foo" and the traceback just says that the query called the < operator, that's not going to be real clear compared to saying that the query called the opclass which called the < operator. A related problem is that the owner of an operator need not match the function it points to; tricking somebody into calling a function by pointing an operator at it is a plausible attack. This is all very much work-in-progress, so if you have concerns, criticisms, or suggestions, I'd rather hear them now than in six months. The one request I have is to think twice before saying "we don't really need all that complexity, we can just <whatever>". I'd be happy to have a simpler solution here, but I don't currently believe that there is a substantially simpler solution that provides anywhere near the same bad-actor detection power. If I've overcomplicated it, it would be good to simplify, but if I haven't, then we need to either bite the bullet and change a lot of code, or accept that this is never going to be improved in any substantial way. Even though the former looks like one heck of a slog, I don't feel good about the latter. -- Robert Haas EDB: http://www.enterprisedb.com
