Re: Recording foreign key relationships for the system catalogs
On Wed, Feb 3, 2021, at 21:41, Joel Jacobson wrote: >Otherwise I think it would be more natural to change both is_array and is_opt >to boolean[] with the same cardinality as fkcols and pkcols, >to allow unnest()ing of them as well. Another option would perhaps be to add a new system view in src/backend/catalog/system_views.sql I see there are other cases with a slightly more complex view using a function with a similar name, such as the pg_stat_activity using pg_stat_get_activity(). Similar to this, maybe we could add a pg_catalog_foreign_keys view using the output from pg_get_catalog_foreign_keys(): Example usage: SELECT * FROM pg_catalog_foreign_keys WHERE fktable = 'pg_constraint'::regclass AND pktable = 'pg_attribute'::regclass; fkid |fktable| fkcol | pktable| pkcol | is_array | is_opt | ordinal_position --+---+---+--+--+--++-- 48 | pg_constraint | conkey| pg_attribute | attnum | t| t |1 48 | pg_constraint | conrelid | pg_attribute | attrelid | f| f |2 49 | pg_constraint | confkey | pg_attribute | attnum | t| f |1 49 | pg_constraint | confrelid | pg_attribute | attrelid | f| f |2 (4 rows) The point of this would be to avoid unnecessary increase of data model complexity, which I agree is not needed, since we only need single booleans as of today, but to provide a more information_schema-like system view, i.e. with columns on separate rows, with ordinal_position. Since we don't have any "constraint_name" for these, we need to enumerate the fks first, to let ordinal_position be the position within each such fkid. Here is my proposal on how to implement: CREATE VIEW pg_catalog_foreign_keys AS WITH enumerate_fks AS ( SELECT *, ROW_NUMBER() OVER () AS fkid FROM pg_catalog.pg_get_catalog_foreign_keys() ), unnest_cols AS ( SELECT C.fkid, C.fktable, unnest(C.fkcols) AS fkcol, C.pktable, unnest(C.pkcols) AS pkcol, unnest( CASE cardinality(fkcols) WHEN 1 THEN ARRAY[C.is_array] WHEN 2 THEN ARRAY[FALSE,C.is_array] END ) AS is_array, unnest( CASE cardinality(fkcols) WHEN 1 THEN ARRAY[C.is_opt] WHEN 2 THEN ARRAY[FALSE,C.is_opt] END ) AS is_opt FROM enumerate_fks AS C ) SELECT *, ROW_NUMBER() OVER ( PARTITION BY U.fkid ORDER BY U.fkcol, U.pkcol ) AS ordinal_position FROM unnest_cols AS U; I think both the pg_get_catalog_foreign_keys() function and this view are useful in different ways, so it's good to provide both. Only providing pg_get_catalog_foreign_keys() will arguably mean some users of the function will need to implement something like the same as above on their own, if they need the is_array and is_opt value for a specific fkcol. /Joel
Re: Recording foreign key relationships for the system catalogs
On Mon, Feb 1, 2021, at 21:03, Tom Lane wrote: >"Joel Jacobson" writes: >> The is_array OUT parameter doesn't say which of the possibly many fkcols >> that is the array column. > >Yeah, I didn't write the sgml docs yet, but the comments explain that >the array is always the last fkcol. Maybe someday that won't be >general enough, but we can cross that bridge when we come to it. I've now fully migrated to using pg_get_catalog_foreign_keys() instead of my own lookup tables, and have some additional hands-on experiences to share with you. I struggle to come up with a clean way to make use of is_array, without being forced to introduce some CASE logic to figure out if the fkcol is an array or not. The alternative to join information_schema.columns and check data_type='ARRAY' is almost simpler, but that seems wrong, since we now have is_array, and using it should be simpler than joining information_schema.columns. The best approach I've come up with so far is the CASE logic below: WITH foreign_keys AS ( SELECT fktable::text AS table_name, unnest(fkcols) AS column_name, pktable::text AS ref_table_name, unnest(pkcols) AS ref_column_name, -- -- is_array refers to the last fkcols column -- unnest ( CASE cardinality(fkcols) WHEN 1 THEN ARRAY[is_array] WHEN 2 THEN ARRAY[FALSE,is_array] END ) AS is_array FROM pg_get_catalog_foreign_keys() ) If is_array would instead have been an boolean[], the query could have been written: WITH foreign_keys AS ( SELECT fktable::text AS table_name, unnest(fkcols) AS column_name, pktable::text AS ref_table_name, unnest(pkcols) AS ref_column_name, unnest(is_array) AS is_array FROM pg_get_catalog_foreign_keys() ) Maybe this can be written in a simpler way already. Otherwise I think it would be more natural to change both is_array and is_opt to boolean[] with the same cardinality as fkcols and pkcols, to allow unnest()ing of them as well. This would also be a more future proof solution, and wouldn't require a code change to code using pg_get_catalog_foreign_keys(), if we would ever add more complex cases in the future. But even without increased future complexity, I think the example above demonstrates a problem already today. Maybe there is a simpler way to achieve what I'm trying to do, i.e. to figure out if a specific fkcol is an array or not, using some other simpler clever trick than the CASE variant above? /Joel
Re: Recording foreign key relationships for the system catalogs
I wrote: > * It would now be possible to remove the PGNSP and PGUID kluges > entirely in favor of plain BKI_LOOKUP references to pg_namespace > and pg_authid. The catalog header usage would get a little > more verbose: BKI_DEFAULT(PGNSP) becomes BKI_DEFAULT(pg_catalog) > and BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES). I'm a bit > inclined to do it, simply to remove one bit of mechanism that has > to be documented; but it's material for a separate patch perhaps. Here's a patch for that part. I think this is probably a good idea not only because it removes magic, but because now that we have various predefined roles it's becoming more and more likely that some of those will need to be cross-referenced in other catalogs' initial data. With this change, nothing special will be needed for that. Multiple built-in schemas also become more feasible than they were. regards, tom lane diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml index 6d3c5be67f..db1b3d5e9a 100644 --- a/doc/src/sgml/bki.sgml +++ b/doc/src/sgml/bki.sgml @@ -540,17 +540,6 @@ expected to be in the pg_catalog schema. - - - - In addition to the generic lookup mechanisms, there is a special - convention that PGNSP is replaced by the OID of - the pg_catalog schema, - and PGUID is replaced by the OID of the bootstrap - superuser role. These usages are somewhat historical but so far - there hasn't been a need to generalize them. - - diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 5bdc7adc44..b159958112 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -184,15 +184,9 @@ my $GenbkiNextOid = $FirstGenbkiObjectId; # within a given Postgres release, such as fixed OIDs. Do not substitute # anything that could depend on platform or configuration. (The right place # to handle those sorts of things is in initdb.c's bootstrap_template1().) -my $BOOTSTRAP_SUPERUSERID = - Catalog::FindDefinedSymbolFromData($catalog_data{pg_authid}, - 'BOOTSTRAP_SUPERUSERID'); my $C_COLLATION_OID = Catalog::FindDefinedSymbolFromData($catalog_data{pg_collation}, 'C_COLLATION_OID'); -my $PG_CATALOG_NAMESPACE = - Catalog::FindDefinedSymbolFromData($catalog_data{pg_namespace}, - 'PG_CATALOG_NAMESPACE'); # Fill in pg_class.relnatts by looking at the referenced catalog's schema. @@ -213,11 +207,12 @@ foreach my $row (@{ $catalog_data{pg_am} }) $amoids{ $row->{amname} } = $row->{oid}; } -# There is only one authid at bootstrap time, and we handle it specially: -# the usually-defaulted symbol PGUID becomes the bootstrap superuser's OID. -# (We could drop this in favor of writing out BKI_DEFAULT(POSTGRES) ...) +# role OID lookup my %authidoids; -$authidoids{'PGUID'} = $BOOTSTRAP_SUPERUSERID; +foreach my $row (@{ $catalog_data{pg_authid} }) +{ + $authidoids{ $row->{rolname} } = $row->{oid}; +} # class (relation) OID lookup (note this only covers bootstrap catalogs!) my %classoids; @@ -240,11 +235,12 @@ foreach my $row (@{ $catalog_data{pg_language} }) $langoids{ $row->{lanname} } = $row->{oid}; } -# There is only one namespace at bootstrap time, and we handle it specially: -# the usually-defaulted symbol PGNSP becomes the pg_catalog namespace's OID. -# (We could drop this in favor of writing out BKI_DEFAULT(pg_catalog) ...) +# namespace (schema) OID lookup my %namespaceoids; -$namespaceoids{'PGNSP'} = $PG_CATALOG_NAMESPACE; +foreach my $row (@{ $catalog_data{pg_namespace} }) +{ + $namespaceoids{ $row->{nspname} } = $row->{oid}; +} # opclass OID lookup my %opcoids; diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat index a643a09588..87d917ffc3 100644 --- a/src/include/catalog/pg_authid.dat +++ b/src/include/catalog/pg_authid.dat @@ -15,6 +15,10 @@ # The C code typically refers to these roles using the #define symbols, # so make sure every entry has an oid_symbol value. +# The bootstrap superuser is named POSTGRES according to this data and +# according to BKI_DEFAULT entries in other catalogs. However, initdb +# will replace that at database initialization time. + { oid => '10', oid_symbol => 'BOOTSTRAP_SUPERUSERID', rolname => 'POSTGRES', rolsuper => 't', rolinherit => 't', rolcreaterole => 't', rolcreatedb => 't', rolcanlogin => 't', diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h index bb6938caa2..3e37729436 100644 --- a/src/include/catalog/pg_class.h +++ b/src/include/catalog/pg_class.h @@ -38,7 +38,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat NameData relname; /* OID of namespace containing this class */ - Oid relnamespace BKI_DEFAULT(PGNSP) BKI_LOOKUP(pg_namespace); + Oid relnamespace BKI_DEFAULT(pg_catalog) BKI_LOOKUP(pg_namespace); /* OID of entry in pg_type for relation's implicit row type, if any */ Oid reltype
Re: Recording foreign key relationships for the system catalogs
"Joel Jacobson" writes: > On Tue, Feb 2, 2021, at 17:00, Tom Lane wrote: >> Appreciate the review! Please confirm if you agree with above >> analysis. > Yes, I agree with the analysis. Cool, thanks. I've pushed it now. regards, tom lane
Re: Recording foreign key relationships for the system catalogs
On Tue, Feb 2, 2021, at 17:00, Tom Lane wrote: >No, I think it's correct as-is (and this is one reason that I chose to >have two separate FK entries for cases like this). confrelid can be >zero in rows that are not FK constraints. However, such a row must >also have empty confkey. The above entry states that for each element >of confkey, the pair (confrelid,confkey[i]) must be nonzero and have >a match in pg_attribute. It creates no constraint if confkey is empty. Thanks for explaining, I get it now. >Appreciate the review! Please confirm if you agree with above >analysis. Yes, I agree with the analysis. /Joel
Re: Recording foreign key relationships for the system catalogs
"Joel Jacobson" writes: > I could only find one minor error, > found by running the regression-tests, > and then using the query below to compare "is_opt" > with my own "zero_values" in my tool > that derives its value from pg_catalog content. > ... > I therefore think is_opt should be changed to true for this row: > fktable| fkcols| pktable| pkcols | > is_array | is_opt > ---+-+--+---+--+ > pg_constraint | {confrelid,confkey} | pg_attribute | {attrelid,attnum} | t > | f No, I think it's correct as-is (and this is one reason that I chose to have two separate FK entries for cases like this). confrelid can be zero in rows that are not FK constraints. However, such a row must also have empty confkey. The above entry states that for each element of confkey, the pair (confrelid,confkey[i]) must be nonzero and have a match in pg_attribute. It creates no constraint if confkey is empty. > If this is fixed, I also agree this is ready to be committed. Appreciate the review! Please confirm if you agree with above analysis. regards, tom lane
Re: Recording foreign key relationships for the system catalogs
On Tue, Feb 2, 2021, at 04:27, Tom Lane wrote: >Attachments: >add-catalog-foreign-key-info-2.patch Very nice. I could only find one minor error, found by running the regression-tests, and then using the query below to compare "is_opt" with my own "zero_values" in my tool that derives its value from pg_catalog content. -- -- Are there any observed oid columns with zero values -- that are also marked as NOT is_opt by pg_get_catalog_foreign_keys()? -- regression=# SELECT table_name, column_name FROM pit.oid_columns WHERE zero_values INTERSECT SELECT fktable::text, unnest(fkcols) FROM pg_get_catalog_foreign_keys() WHERE NOT is_opt; Expected to return no rows but: table_name | column_name ---+- pg_constraint | confrelid (1 row) regression=# SELECT * FROM pg_get_catalog_foreign_keys() WHERE 'confrelid' = ANY(fkcols); fktable| fkcols| pktable| pkcols | is_array | is_opt ---+-+--+---+--+ pg_constraint | {confrelid} | pg_class | {oid} | f | t pg_constraint | {confrelid,confkey} | pg_attribute | {attrelid,attnum} | t | f (2 rows) Reading the new documentation, I interpret "is_opt=false" to be a negation of "the referencing column(s) are allowed to contain zeroes instead of a valid reference" i.e. that none of the referencing columns (fkcols) are allowed to contain zeroes, but since "confrelid" apparently can contain zeroes: regression=# select * from pg_constraint where confrelid = 0 limit 1; -[ RECORD 1 ]-+-- oid | 12111 conname | pg_proc_oid_index connamespace | 11 contype | p condeferrable | f condeferred | f convalidated | t conrelid | 1255 contypid | 0 conindid | 2690 conparentid | 0 confrelid | 0 confupdtype | confdeltype | confmatchtype | conislocal| t coninhcount | 0 connoinherit | t conkey| {1} confkey | confreftype | conpfeqop | conppeqop | conffeqop | conexclop | conbin| I therefore think is_opt should be changed to true for this row: fktable| fkcols| pktable| pkcols | is_array | is_opt ---+-+--+---+--+ pg_constraint | {confrelid,confkey} | pg_attribute | {attrelid,attnum} | t | f If this is fixed, I also agree this is ready to be committed. /Joel
Re: Recording foreign key relationships for the system catalogs
"Joel Jacobson" writes: > The is_array OUT parameter doesn't say which of the possibly many fkcols that > is the array column. Yeah, I didn't write the sgml docs yet, but the comments explain that the array is always the last fkcol. Maybe someday that won't be general enough, but we can cross that bridge when we come to it. regards, tom lane
Re: Recording foreign key relationships for the system catalogs
On Mon, Feb 1, 2021, at 20:33, Joel Jacobson wrote: >Suggestions on how to fix: > >* Make is_array an boolean[], and let each element represent the is_array >value for each fkcols element. > >* Change interface to be more like information_schema, and add a >"ordinal_position" column, and return each column on a separate row. > >I think I prefer the latter since it's more information_schema-conformant, but >any works. Ops. I see a problem with just adding a "ordinal_position", since then one would also need enumerate the "foreing key" constraints or give them names like "constraint_name" in information_schema.table_constraints (since there can be multiple foreign keys per fktable, so there would be multiple e.g. ordinal_position=1 per fktable). With this into consideration, I think the easiest and cleanest solution is to make is_array a boolean[]. I like the usage of arrays, it makes it much easier to understand the output, as it's the visually easy to see what groups of columns that references what group of columns. /Joel
Re: Recording foreign key relationships for the system catalogs
Hi again, After trying to use pg_get_catalog_foreign_keys() to replace what I had before, I notice one ambiguity which I think is a serious problem in the machine-readable context. The is_array OUT parameter doesn't say which of the possibly many fkcols that is the array column. One example: fktable|fkcols | pktable| pkcols | is_array --+---+--+---+-- pg_constraint| {conrelid,conkey} | pg_attribute | {attrelid,attnum} | t Is the array "conrelid" or is it "conkey"? As a human, I know it's "conkey", but for a machine to figure out, one would need to join information_schema.columns and check the data_type or something similar. Suggestions on how to fix: * Make is_array an boolean[], and let each element represent the is_array value for each fkcols element. * Change interface to be more like information_schema, and add a "ordinal_position" column, and return each column on a separate row. I think I prefer the latter since it's more information_schema-conformant, but any works. /Joel
Re: Recording foreign key relationships for the system catalogs
Could it be an idea to also add OUT can_be_zero boolean to pg_get_catalog_foreign_keys()'s out parameters? This information is useful to know if one should be doing an INNER JOIN or a LEFT JOIN on the foreign keys. The information is mostly available in the documentation already, but not quite accurate, which I've proposed a patch [1] to fix. [1] https://www.postgresql.org/message-id/4ed9a372-7bf9-479a-926c-ae8e77471...@www.fastmail.com
Re: Recording foreign key relationships for the system catalogs
Very nice. Thanks to this patch, I can get rid of my own parse-catalogs.pl hack and use pg_get_catalog_foreign_keys() instead. +1 I can with high confidence assert the correctness of pg_get_catalog_foreign_keys()'s output, as it matches the lookup tables for the tool I'm hacking on precisely: -- -- verify single column foreign keys -- WITH a AS ( SELECT fktable::text, fkcols[1]::text, pktable::text, pkcols[1]::text FROM pg_get_catalog_foreign_keys() WHERE cardinality(fkcols) = 1 ), b AS ( SELECT table_name, column_name, ref_table_name, ref_column_name FROM pit.oid_joins ) SELECT (SELECT COUNT(*) FROM (SELECT * FROM a EXCEPT SELECT * FROM b) AS x) AS except_b, (SELECT COUNT(*) FROM (SELECT * FROM b EXCEPT SELECT * FROM a) AS x) AS except_a, (SELECT COUNT(*) FROM (SELECT * FROM b INTERSECT SELECT * FROM a) AS x) AS a_intersect_b ; except_b | except_a | a_intersect_b --+--+--- 0 |0 | 209 (1 row) -- -- verify multi-column foreign keys -- WITH a AS ( SELECT fktable::text, fkcols, pktable::text, pkcols FROM pg_get_catalog_foreign_keys() WHERE cardinality(fkcols) > 1 ), b AS ( SELECT table_name, ARRAY[rel_column_name,attnum_column_name], 'pg_attribute', '{attrelid,attnum}'::text[] FROM pit.attnum_joins ) SELECT (SELECT COUNT(*) FROM (SELECT * FROM a EXCEPT SELECT * FROM b) AS x) AS except_b, (SELECT COUNT(*) FROM (SELECT * FROM b EXCEPT SELECT * FROM a) AS x) AS except_a, (SELECT COUNT(*) FROM (SELECT * FROM b INTERSECT SELECT * FROM a) AS x) AS a_intersect_b ; except_b | except_a | a_intersect_b --+--+--- 0 |0 | 8 (1 row) /Joel On Sun, Jan 31, 2021, at 22:39, Tom Lane wrote: > Now that dfb75e478 is in, I looked into whether we can have some > machine-readable representation of the catalogs' foreign key > relationships. As per the previous discussion [1], it's not > practical to have "real" SQL foreign key constraints, because > the semantics we use aren't quite right (i.e., using 0 instead > of NULL in rows with no reference). Nonetheless it would be > nice to have the knowledge available in some form, and we agreed > that a set-returning function returning the data would be useful. > > The attached patch creates that function, and rewrites the oidjoins.sql > regression test to use it, in place of the very incomplete info that's > reverse-engineered by findoidjoins. It's mostly straightforward. > > My original thought had been to add DECLARE_FOREIGN_KEY() macros > for all references, but I soon realized that in a large majority of > the cases, that's redundant with the BKI_LOOKUP() annotations we > already have. So I taught genbki.pl to extract FK data from > BKI_LOOKUP() as well as the explicit DECLARE macros. That didn't > remove the work entirely, because it turned out that we hadn't > bothered to apply BKI_LOOKUP() labels to most of the catalogs that > have no hand-made data. A big chunk of the patch consists in > adding those as needed. Also, I had to make the BKI_LOOKUP() > mechanism a little more complete, because it failed on pg_namespace > and pg_authid references. (It will still fail on some other > cases such as BKI_LOOKUP(pg_foreign_server), but I think there's > no need to fill that in until/unless we have some built-in data > that needs it.) > > There are various loose ends yet to be cleaned up: > > * I'm unsure whether it's better for the SRF to return the > column names as textual names, or as column numbers. Names was > a bit easier for all the parts of the current patch so I did > it that way, but maybe there's a case for the other way. > Actually the whole API for the SRF is just spur-of-the-moment, > so maybe a different API would be better. > > * It would now be possible to remove the PGNSP and PGUID kluges > entirely in favor of plain BKI_LOOKUP references to pg_namespace > and pg_authid. The catalog header usage would get a little > more verbose: BKI_DEFAULT(PGNSP) becomes BKI_DEFAULT(pg_catalog) > and BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES). I'm a bit > inclined to do it, simply to remove one bit of mechanism that has > to be documented; but it's material for a separate patch perhaps. > > * src/tools/findoidjoins should be nuked entirely, AFAICS. > Again, that could be a follow-on patch. > > * I've not touched the SGML docs. Certainly > pg_get_catalog_foreign_keys() should be documented, and some > adjustments in bki.sgml might be appropriate. > > regards, tom lane > > [1] > https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98%402ndquadrant.com > > > > *Attachments:* > * add-catalog-foreign-key-info-1.patch Kind regards, Joel