Hi, while hacking up nodeAgg.c to use a dedicated slot/desc for the hashtable, an assert I placed brought up a weird case:
regression[29535][1]=# EXPLAIN VERBOSE update bar set f2 = f2 + 100 where f1 in (select f1 from foo); ┌───────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────┤ │ Update on public.bar (cost=42.75..109.25 rows=1130 width=20) │ │ -> Hash Join (cost=42.75..109.25 rows=1130 width=20) │ │ Output: bar.f1, (bar.f2 + 100), bar.ctid, foo.ctid │ │ Hash Cond: (bar.f1 = foo.f1) │ │ -> Seq Scan on public.bar (cost=0.00..32.60 rows=2260 width=14) │ │ Output: bar.f1, bar.f2, bar.ctid │ │ -> Hash (cost=40.25..40.25 rows=200 width=10) │ │ Output: foo.ctid, foo.f1 │ │ -> HashAggregate (cost=38.25..40.25 rows=200 width=10) │ │ Output: foo.ctid, foo.f1 │ │ Group Key: foo.f1 │ │ -> Seq Scan on public.foo (cost=0.00..32.60 rows=2260 width=10) │ │ Output: foo.ctid, foo.f1 │ └───────────────────────────────────────────────────────────────────────────────────────┘ (13 rows) this doesn't look right. The ctid shouldn't be in the aggregate output - after all it's pretty much meaningless here. Note that in this case find_hash_columns(aggstate) will return two columns, but node->numCols will be 1 (and thus node->grpOperators only have one element). So it seems we're doing something wrong here. Casting a wider net: find_hash_columns() and it's subroutines seem like pretty much dead code, including outdated comments (look for "SQL99 semantics"). The main explanation says: /* * Create a list of the tuple columns that actually need to be stored in * hashtable entries. The incoming tuples from the child plan node will * contain grouping columns, other columns referenced in our targetlist and * qual, columns used to compute the aggregate functions, and perhaps just * junk columns we don't use at all. Only columns of the first two types * need to be stored in the hashtable, and getting rid of the others can * make the table entries significantly smaller. To avoid messing up Var * numbering, we keep the same tuple descriptor for hashtable entries as the * incoming tuples have, but set unwanted columns to NULL in the tuples that * go into the table. but that seems bogus - when are we referencing "other columns referenced in our targetlist and qual" from representative tuple that aren't also grouped upon? This method would select pretty arbitrary value for those, since we don't hash them, so it'll just be the tuple first occurred? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers