Re: ExecTypeSetColNames is fundamentally broken
Aleksander Alekseev writes: > I understand the concern expressed by Robert in respect of backward > compatibility. From the user's perspective, personally I would prefer > a fixed bug over backward compatibility. Especially if we consider the > fact that the current behaviour of cases like `select row_to_json(i) > from int8_tbl i(x,y)` is not necessarily the correct one, at least it > doesn't look right to me. It's debatable anyway. I'd be less worried about changing this behavior if we didn't have to back-patch it ... but I see no good alternative. > So unless anyone has strong objections against the proposed fix or can > propose a better patch, I would suggest merging it. Done now. regards, tom lane
Re: ExecTypeSetColNames is fundamentally broken
Hi hackers, > Anyone else have thoughts? I came across this thread while looking for the patches that need review. My understanding of the code is limited, but I can say that I don't see anything particularly wrong with it. I can also confirm that it fixes the problem reported by the user while passing the rest of the tests. I understand the concern expressed by Robert in respect of backward compatibility. From the user's perspective, personally I would prefer a fixed bug over backward compatibility. Especially if we consider the fact that the current behaviour of cases like `select row_to_json(i) from int8_tbl i(x,y)` is not necessarily the correct one, at least it doesn't look right to me. So unless anyone has strong objections against the proposed fix or can propose a better patch, I would suggest merging it. -- Best regards, Aleksander Alekseev
Re: ExecTypeSetColNames is fundamentally broken
On Tue, Dec 7, 2021 at 1:19 PM Tom Lane wrote: > So the alternatives I see are to revert what bf7ca1587 tried to do > here, or to try to make it work that way across-the-board, which > implies (a) a very much larger amount of work, and (b) breaking > important behaviors that are decades older than that commit. > It's not even entirely clear that we could get to complete > consistency if we went down that path. Continuing my tour through the "bug fixes" section of the CommitFest, I came upon this thread. Unfortunately there's not that much I can do to progress it, because I've already expressed all the opinions that I have on this thread. If we back-patch Tom's originally proposed fix, i expect we might get a complaint or too, but the current behavior of being able to create unreadable tables is indisputably poor, and I'm not in a position to tell Tom that he has to go write a different fix instead, or even that such a fix is possible. Unless somebody else wants to comment, which IMHO would be good, I think it's up to Tom to make a decision here on how he'd like to proceed and then, probably, just do it. Anyone else have thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
Re: ExecTypeSetColNames is fundamentally broken
Robert Haas writes: > On Tue, Dec 7, 2021 at 12:30 PM Tom Lane wrote: >> If we consider that the alias renames the columns "for all purposes", >> how is it okay for f() to select the "a" column? > I'd say it isn't. In a green field I'd probably agree with you, but IMO that will break far too much existing SQL code. It'd cause problems for us too, not only end-users. As an example, ruleutils.c would have to avoid attaching new column aliases to tables that are referenced as whole-row Vars. I'm not very sure that that's even possible without creating insurmountable ambiguity issues. There are also fun issues around what happens to a stored query after a table column rename. Right now the query acts as though it uses the old name as a column alias, and that introduces no semantic problem; but that behavior would no longer be acceptable. So the alternatives I see are to revert what bf7ca1587 tried to do here, or to try to make it work that way across-the-board, which implies (a) a very much larger amount of work, and (b) breaking important behaviors that are decades older than that commit. It's not even entirely clear that we could get to complete consistency if we went down that path. regards, tom lane
Re: ExecTypeSetColNames is fundamentally broken
On Tue, Dec 7, 2021 at 12:30 PM Tom Lane wrote: > If we consider that the alias renames the columns "for all purposes", > how is it okay for f() to select the "a" column? I'd say it isn't. > Another way to phrase the issue is that the column names seen > by f() are currently different from those seen by row_to_json(): > > regression=# select row_to_json(t) from t as t(x,y); >row_to_json > - > {"x":11,"y":12} > (1 row) > > and that seems hard to justify. Yeah, I agree. The problem I have here is that, with your proposed fix, it still won't be very consistent. True, row_to_json() and f() will both see the unaliased column names ... but a straight select * from t as t(x,y) will show the aliased names. That's unarguably better than corrupting your data, but it seems "astonishing" in the POLA sense. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ExecTypeSetColNames is fundamentally broken
Robert Haas writes: > On Mon, Dec 6, 2021 at 4:05 PM Tom Lane wrote: >> select f(t) from t(x,y); >> >> If we adopt the "rename for all purposes" interpretation, then >> the second SELECT must fail, because what f() is being passed is >> no longer of type t. > For me, the second SELECT does fail: > rhaas=# select f(t) from t(x,y); > ERROR: column "x" does not exist Ah, sorry, I fat-fingered the alias syntax. Here's a tested example: regression=# create table t (a int, b int); CREATE TABLE regression=# insert into t values(11,12); INSERT 0 1 regression=# create function f(t) returns int as 'select $1.a' language sql; CREATE FUNCTION regression=# select f(t) from t as t(x,y); f 11 (1 row) If we consider that the alias renames the columns "for all purposes", how is it okay for f() to select the "a" column? Another way to phrase the issue is that the column names seen by f() are currently different from those seen by row_to_json(): regression=# select row_to_json(t) from t as t(x,y); row_to_json - {"x":11,"y":12} (1 row) and that seems hard to justify. regards, tom lane
Re: ExecTypeSetColNames is fundamentally broken
On Mon, Dec 6, 2021 at 4:05 PM Tom Lane wrote: > Well, that was what I thought when I wrote bf7ca1587, but it leads > to logical contradictions. Consider > > create table t (a int, b int); > > create function f(t) returns ... ; > > select f(t) from t; > > select f(t) from t(x,y); > > If we adopt the "rename for all purposes" interpretation, then > the second SELECT must fail, because what f() is being passed is > no longer of type t. If you ask me, that'll be a bigger problem > for users than the change I'm proposing (quite independently of > how hard it might be to implement). It certainly will break > a behavior that goes back much further than bf7ca1587. For me, the second SELECT does fail: rhaas=# select f(t) from t(x,y); ERROR: column "x" does not exist LINE 1: select f(t) from t(x,y); ^ If it didn't fail, what would the behavior be? I suppose you could make an argument for trying to match up the columns by position, but if so this ought to work: rhaas=# create table u(a int, b int); CREATE TABLE rhaas=# select f(u) from u; ERROR: function f(u) does not exist rhaas=# select f(u::t) from u; ERROR: cannot cast type u to t Matching columns by name can't work because the names don't match. Matching columns by position does not work. So if that example succeeds, the only real explanation is that we magically know that we've still got a value of type t despite the user's best attempt to decree otherwise. I know PostgreSQL sometimes ... does things like that. I have no idea why anyone would consider it a desirable behavior, though. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ExecTypeSetColNames is fundamentally broken
Robert Haas writes: > I don't understand the code so I can't comment on the code, but I find > the regression test changes pretty suspect. Attaching any alias list > to the RTE ought to rename the output columns for all purposes, not > just the ones we as implementers find convenient. Well, that was what I thought when I wrote bf7ca1587, but it leads to logical contradictions. Consider create table t (a int, b int); create function f(t) returns ... ; select f(t) from t; select f(t) from t(x,y); If we adopt the "rename for all purposes" interpretation, then the second SELECT must fail, because what f() is being passed is no longer of type t. If you ask me, that'll be a bigger problem for users than the change I'm proposing (quite independently of how hard it might be to implement). It certainly will break a behavior that goes back much further than bf7ca1587. regards, tom lane
Re: ExecTypeSetColNames is fundamentally broken
On Sun, Dec 5, 2021 at 1:46 PM Tom Lane wrote: > So 0001 attached fixes this by revoking the decision to apply > ExecTypeSetColNames in cases where a Var or RowExpr is declared > to return a named composite type. This causes a couple of regression > test results to change, but they are ones that were specifically > added to exercise this behavior that we now see to be invalid. I don't understand the code so I can't comment on the code, but I find the regression test changes pretty suspect. Attaching any alias list to the RTE ought to rename the output columns for all purposes, not just the ones we as implementers find convenient. I understand that we have to do *something* here and that the present behavior is buggy and unacceptable ... but I'm not sure I accept that the only possible fix is the one you propose here. -- Robert Haas EDB: http://www.enterprisedb.com
ExecTypeSetColNames is fundamentally broken
Changing Fcc: +inbox I looked into the failure reported at [1]. Basically what's happening there is that we're allowing a composite datum of type RECORD to get stored into a table, whereupon other backends can't make sense of it since they lack the appropriate typcache entry. The reason the datum is marked as type RECORD is that ExecTypeSetColNames set things up that way, after observing that the tuple descriptor obtained from the current table definition didn't have the column names it thought it should have. Now, in the test case as given, ExecTypeSetColNames is in error to think that, because it fails to account for the possibility that the tupdesc contains dropped columns that weren't dropped when the relevant RTE was made. However, if the test case is modified so that we just rename rather than drop some pre-existing column, then even with a fix for that problem ExecTypeSetColNames would do the wrong thing. I made several attempts to work around this problem, but I've now concluded that changing the type OID in ExecTypeSetColNames is just fundamentally broken. It can't be okay to decide that a Var that claims to emit type A actually emits type B, and especially not to do so as late as executor startup. I speculated in the other thread that we could do so during planning instead, but that turns out to just move the problems around. I think this must be so, because the whole idea is bogus. For example, if we have a function that is declared to take type "ct", it can't be okay in general to pass it type "record" instead. We've mistakenly thought that we could fuzz this as long as the two types are physically compatible --- but how can we think that the column names of a composite type aren't a basic part of its definition? So 0001 attached fixes this by revoking the decision to apply ExecTypeSetColNames in cases where a Var or RowExpr is declared to return a named composite type. This causes a couple of regression test results to change, but they are ones that were specifically added to exercise this behavior that we now see to be invalid. (In passing, this also adjusts ExecEvalWholeRowVar to fail if the Var claims to be of a domain-over-composite. I'm not sure what I was thinking when I changed it to allow that; the case should not arise, and if it did, it'd be questionable because we're not checking domain constraints here.) 0002 is some inessential mop-up that avoids storing useless column name lists in RowExprs where they won't be used. I'm not really thrilled about back-patching a behavioral change of this sort, but I don't see any good alternative. Corrupting people's tables is not okay. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAOFAq3BeawPiw9pc3bVGZ%3DRint2txWEBCeDC2wNAhtCZoo2ZqA%40mail.gmail.com diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 892b4e17e0..f0f3b4ffb0 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -1913,16 +1913,16 @@ ExecInitExprRec(Expr *node, ExprState *state, { /* generic record, use types of given expressions */ tupdesc = ExecTypeFromExprList(rowexpr->args); + /* ... but adopt RowExpr's column aliases */ + ExecTypeSetColNames(tupdesc, rowexpr->colnames); + /* Bless the tupdesc so it can be looked up later */ + BlessTupleDesc(tupdesc); } else { /* it's been cast to a named type, use that */ tupdesc = lookup_rowtype_tupdesc_copy(rowexpr->row_typeid, -1); } -/* In either case, adopt RowExpr's column aliases */ -ExecTypeSetColNames(tupdesc, rowexpr->colnames); -/* Bless the tupdesc in case it's now of type RECORD */ -BlessTupleDesc(tupdesc); /* * In the named-type case, the tupdesc could have more columns diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index eb49817cee..b7b79e0b75 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -4021,12 +4021,8 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext) * generates an INT4 NULL regardless of the dropped column type). * If we find a dropped column and cannot verify that case (1) * holds, we have to use the slow path to check (2) for each row. - * - * If vartype is a domain over composite, just look through that - * to the base composite type. */ - var_tupdesc = lookup_rowtype_tupdesc_domain(variable->vartype, - -1, false); + var_tupdesc = lookup_rowtype_tupdesc(variable->vartype, -1); slot_tupdesc = slot->tts_tupleDescriptor; @@ -4063,9 +4059,8 @@ ExecEvalWholeRowVar(ExprState *state, ExprEvalStep *op, ExprContext *econtext) /* * Use the variable's declared rowtype as the descriptor for the - * output values, modulo possibly assigning new column names -