Re: Recording foreign key relationships for the system catalogs

2021-02-03 Thread Joel Jacobson
On Wed, Feb 3, 2021, at 21:41, Joel Jacobson wrote:
>Otherwise I think it would be more natural to change both is_array and is_opt
>to boolean[] with the same cardinality as fkcols and pkcols,
>to allow unnest()ing of them as well.

Another option would perhaps be to add a new
system view in src/backend/catalog/system_views.sql

I see there are other cases with a slightly more complex view
using a function with a similar name, such as
the pg_stat_activity using pg_stat_get_activity().

Similar to this, maybe we could add a pg_catalog_foreign_keys view
using the output from pg_get_catalog_foreign_keys():

Example usage:

SELECT * FROM pg_catalog_foreign_keys
WHERE fktable = 'pg_constraint'::regclass
AND pktable = 'pg_attribute'::regclass;

fkid |fktable|   fkcol   |   pktable|  pkcol   | is_array | is_opt 
| ordinal_position
--+---+---+--+--+--++--
   48 | pg_constraint | conkey| pg_attribute | attnum   | t| t  
|1
   48 | pg_constraint | conrelid  | pg_attribute | attrelid | f| f  
|2
   49 | pg_constraint | confkey   | pg_attribute | attnum   | t| f  
|1
   49 | pg_constraint | confrelid | pg_attribute | attrelid | f| f  
|2
(4 rows)

The point of this would be to avoid unnecessary increase of data model 
complexity,
which I agree is not needed, since we only need single booleans as of today,
but to provide a more information_schema-like system view,
i.e. with columns on separate rows, with ordinal_position.

Since we don't have any "constraint_name" for these,
we need to enumerate the fks first, to let ordinal_position
be the position within each such fkid.

Here is my proposal on how to implement:

CREATE VIEW pg_catalog_foreign_keys AS
WITH
enumerate_fks AS (
SELECT
*,
ROW_NUMBER() OVER () AS fkid
FROM pg_catalog.pg_get_catalog_foreign_keys()
),
unnest_cols AS (
SELECT
C.fkid,
C.fktable,
unnest(C.fkcols) AS fkcol,
C.pktable,
unnest(C.pkcols) AS pkcol,
unnest(
CASE cardinality(fkcols)
WHEN 1 THEN ARRAY[C.is_array]
WHEN 2 THEN ARRAY[FALSE,C.is_array]
END
) AS is_array,
unnest(
CASE cardinality(fkcols)
WHEN 1 THEN ARRAY[C.is_opt]
WHEN 2 THEN ARRAY[FALSE,C.is_opt]
END
) AS is_opt
FROM enumerate_fks AS C
)
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY U.fkid
ORDER BY U.fkcol, U.pkcol
) AS ordinal_position
FROM unnest_cols AS U;

I think both the pg_get_catalog_foreign_keys() function
and this view are useful in different ways,
so it's good to provide both.

Only providing pg_get_catalog_foreign_keys() will
arguably mean some users of the function will need to implement
something like the same as above on their own, if they need the is_array and 
is_opt
value for a specific fkcol.

/Joel

Re: Recording foreign key relationships for the system catalogs

2021-02-03 Thread Joel Jacobson
On Mon, Feb 1, 2021, at 21:03, Tom Lane wrote:
>"Joel Jacobson"  writes:
>> The is_array OUT parameter doesn't say which of the possibly many fkcols 
>> that is the array column.
>
>Yeah, I didn't write the sgml docs yet, but the comments explain that
>the array is always the last fkcol.  Maybe someday that won't be
>general enough, but we can cross that bridge when we come to it.

I've now fully migrated to using pg_get_catalog_foreign_keys()
instead of my own lookup tables, and have some additional hands-on experiences
to share with you.

I struggle to come up with a clean way to make use of is_array,
without being forced to introduce some CASE logic to figure out
if the fkcol is an array or not.

The alternative to join information_schema.columns and check data_type='ARRAY' 
is almost simpler,
but that seems wrong, since we now have is_array, and using it should be 
simpler than
joining information_schema.columns.

The best approach I've come up with so far is the CASE logic below:

WITH
foreign_keys AS
(
  SELECT
fktable::text AS table_name,
unnest(fkcols) AS column_name,
pktable::text AS ref_table_name,
unnest(pkcols) AS ref_column_name,
--
-- is_array refers to the last fkcols column
--
unnest
(
  CASE cardinality(fkcols)
  WHEN 1 THEN ARRAY[is_array]
  WHEN 2 THEN ARRAY[FALSE,is_array]
  END
) AS is_array
  FROM pg_get_catalog_foreign_keys()
)

If is_array would instead have been an boolean[], the query could have been 
written:

WITH
foreign_keys AS
(
  SELECT
fktable::text AS table_name,
unnest(fkcols) AS column_name,
pktable::text AS ref_table_name,
unnest(pkcols) AS ref_column_name,
unnest(is_array) AS is_array
  FROM pg_get_catalog_foreign_keys()
)

Maybe this can be written in a simpler way already.

Otherwise I think it would be more natural to change both is_array and is_opt
to boolean[] with the same cardinality as fkcols and pkcols,
to allow unnest()ing of them as well.

This would also be a more future proof solution,
and wouldn't require a code change to code using pg_get_catalog_foreign_keys(),
if we would ever add more complex cases in the future.

But even without increased future complexity,
I think the example above demonstrates a problem already today.

Maybe there is a simpler way to achieve what I'm trying to do,
i.e. to figure out if a specific fkcol is an array or not,
using some other simpler clever trick than the CASE variant above?

/Joel



Re: Recording foreign key relationships for the system catalogs

2021-02-02 Thread Tom Lane
I wrote:
> * It would now be possible to remove the PGNSP and PGUID kluges
> entirely in favor of plain BKI_LOOKUP references to pg_namespace
> and pg_authid.  The catalog header usage would get a little
> more verbose: BKI_DEFAULT(PGNSP) becomes BKI_DEFAULT(pg_catalog)
> and BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES).  I'm a bit
> inclined to do it, simply to remove one bit of mechanism that has
> to be documented; but it's material for a separate patch perhaps.

Here's a patch for that part.  I think this is probably a good
idea not only because it removes magic, but because now that we
have various predefined roles it's becoming more and more likely
that some of those will need to be cross-referenced in other
catalogs' initial data.  With this change, nothing special
will be needed for that.  Multiple built-in schemas also become
more feasible than they were.

regards, tom lane

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 6d3c5be67f..db1b3d5e9a 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -540,17 +540,6 @@
   expected to be in the pg_catalog schema.
  
 
-
-
- 
-  In addition to the generic lookup mechanisms, there is a special
-  convention that PGNSP is replaced by the OID of
-  the pg_catalog schema,
-  and PGUID is replaced by the OID of the bootstrap
-  superuser role.  These usages are somewhat historical but so far
-  there hasn't been a need to generalize them.
- 
-

 

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 5bdc7adc44..b159958112 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -184,15 +184,9 @@ my $GenbkiNextOid = $FirstGenbkiObjectId;
 # within a given Postgres release, such as fixed OIDs.  Do not substitute
 # anything that could depend on platform or configuration.  (The right place
 # to handle those sorts of things is in initdb.c's bootstrap_template1().)
-my $BOOTSTRAP_SUPERUSERID =
-  Catalog::FindDefinedSymbolFromData($catalog_data{pg_authid},
-	'BOOTSTRAP_SUPERUSERID');
 my $C_COLLATION_OID =
   Catalog::FindDefinedSymbolFromData($catalog_data{pg_collation},
 	'C_COLLATION_OID');
-my $PG_CATALOG_NAMESPACE =
-  Catalog::FindDefinedSymbolFromData($catalog_data{pg_namespace},
-	'PG_CATALOG_NAMESPACE');
 
 
 # Fill in pg_class.relnatts by looking at the referenced catalog's schema.
@@ -213,11 +207,12 @@ foreach my $row (@{ $catalog_data{pg_am} })
 	$amoids{ $row->{amname} } = $row->{oid};
 }
 
-# There is only one authid at bootstrap time, and we handle it specially:
-# the usually-defaulted symbol PGUID becomes the bootstrap superuser's OID.
-# (We could drop this in favor of writing out BKI_DEFAULT(POSTGRES) ...)
+# role OID lookup
 my %authidoids;
-$authidoids{'PGUID'} = $BOOTSTRAP_SUPERUSERID;
+foreach my $row (@{ $catalog_data{pg_authid} })
+{
+	$authidoids{ $row->{rolname} } = $row->{oid};
+}
 
 # class (relation) OID lookup (note this only covers bootstrap catalogs!)
 my %classoids;
@@ -240,11 +235,12 @@ foreach my $row (@{ $catalog_data{pg_language} })
 	$langoids{ $row->{lanname} } = $row->{oid};
 }
 
-# There is only one namespace at bootstrap time, and we handle it specially:
-# the usually-defaulted symbol PGNSP becomes the pg_catalog namespace's OID.
-# (We could drop this in favor of writing out BKI_DEFAULT(pg_catalog) ...)
+# namespace (schema) OID lookup
 my %namespaceoids;
-$namespaceoids{'PGNSP'} = $PG_CATALOG_NAMESPACE;
+foreach my $row (@{ $catalog_data{pg_namespace} })
+{
+	$namespaceoids{ $row->{nspname} } = $row->{oid};
+}
 
 # opclass OID lookup
 my %opcoids;
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index a643a09588..87d917ffc3 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -15,6 +15,10 @@
 # The C code typically refers to these roles using the #define symbols,
 # so make sure every entry has an oid_symbol value.
 
+# The bootstrap superuser is named POSTGRES according to this data and
+# according to BKI_DEFAULT entries in other catalogs.  However, initdb
+# will replace that at database initialization time.
+
 { oid => '10', oid_symbol => 'BOOTSTRAP_SUPERUSERID',
   rolname => 'POSTGRES', rolsuper => 't', rolinherit => 't',
   rolcreaterole => 't', rolcreatedb => 't', rolcanlogin => 't',
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index bb6938caa2..3e37729436 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -38,7 +38,7 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	NameData	relname;
 
 	/* OID of namespace containing this class */
-	Oid			relnamespace BKI_DEFAULT(PGNSP) BKI_LOOKUP(pg_namespace);
+	Oid			relnamespace BKI_DEFAULT(pg_catalog) BKI_LOOKUP(pg_namespace);
 
 	/* OID of entry in pg_type for relation's implicit row type, if any */
 	Oid			reltype 

Re: Recording foreign key relationships for the system catalogs

2021-02-02 Thread Tom Lane
"Joel Jacobson"  writes:
> On Tue, Feb 2, 2021, at 17:00, Tom Lane wrote:
>> Appreciate the review!  Please confirm if you agree with above
>> analysis.

> Yes, I agree with the analysis.

Cool, thanks.  I've pushed it now.

regards, tom lane




Re: Recording foreign key relationships for the system catalogs

2021-02-02 Thread Joel Jacobson
On Tue, Feb 2, 2021, at 17:00, Tom Lane wrote:
>No, I think it's correct as-is (and this is one reason that I chose to
>have two separate FK entries for cases like this).  confrelid can be
>zero in rows that are not FK constraints.  However, such a row must
>also have empty confkey.  The above entry states that for each element
>of confkey, the pair (confrelid,confkey[i]) must be nonzero and have
>a match in pg_attribute.  It creates no constraint if confkey is empty.

Thanks for explaining, I get it now.

>Appreciate the review!  Please confirm if you agree with above
>analysis.

Yes, I agree with the analysis.

/Joel

Re: Recording foreign key relationships for the system catalogs

2021-02-02 Thread Tom Lane
"Joel Jacobson"  writes:
> I could only find one minor error,
> found by running the regression-tests,
> and then using the query below to compare "is_opt"
> with my own "zero_values" in my tool
> that derives its value from pg_catalog content.
> ...
> I therefore think is_opt should be changed to true for this row:
> fktable|   fkcols|   pktable|  pkcols   | 
> is_array | is_opt
> ---+-+--+---+--+
> pg_constraint | {confrelid,confkey} | pg_attribute | {attrelid,attnum} | t
> | f

No, I think it's correct as-is (and this is one reason that I chose to
have two separate FK entries for cases like this).  confrelid can be
zero in rows that are not FK constraints.  However, such a row must
also have empty confkey.  The above entry states that for each element
of confkey, the pair (confrelid,confkey[i]) must be nonzero and have
a match in pg_attribute.  It creates no constraint if confkey is empty.

> If this is fixed, I also agree this is ready to be committed.

Appreciate the review!  Please confirm if you agree with above
analysis.

regards, tom lane




Re: Recording foreign key relationships for the system catalogs

2021-02-01 Thread Joel Jacobson
On Tue, Feb 2, 2021, at 04:27, Tom Lane wrote:
>Attachments:
>add-catalog-foreign-key-info-2.patch

Very nice.

I could only find one minor error,
found by running the regression-tests,
and then using the query below to compare "is_opt"
with my own "zero_values" in my tool
that derives its value from pg_catalog content.

--
-- Are there any observed oid columns with zero values
-- that are also marked as NOT is_opt by pg_get_catalog_foreign_keys()?
--
regression=# SELECT
  table_name,
  column_name
FROM pit.oid_columns
WHERE zero_values
INTERSECT
SELECT
  fktable::text,
  unnest(fkcols)
FROM pg_get_catalog_foreign_keys()
WHERE NOT is_opt;

Expected to return no rows but:

  table_name   | column_name
---+-
pg_constraint | confrelid
(1 row)

regression=# SELECT * FROM pg_get_catalog_foreign_keys() WHERE 'confrelid' = 
ANY(fkcols);
fktable|   fkcols|   pktable|  pkcols   | 
is_array | is_opt
---+-+--+---+--+
pg_constraint | {confrelid} | pg_class | {oid} | f  
  | t
pg_constraint | {confrelid,confkey} | pg_attribute | {attrelid,attnum} | t  
  | f
(2 rows)

Reading the new documentation, I interpret "is_opt=false" to be a negation of

   "the referencing column(s) are allowed to contain zeroes instead of a valid 
reference"

i.e. that none of the referencing columns (fkcols) are allowed to contain 
zeroes,
but since "confrelid" apparently can contain zeroes:

regression=# select * from pg_constraint where confrelid = 0 limit 1;
-[ RECORD 1 ]-+--
oid   | 12111
conname   | pg_proc_oid_index
connamespace  | 11
contype   | p
condeferrable | f
condeferred   | f
convalidated  | t
conrelid  | 1255
contypid  | 0
conindid  | 2690
conparentid   | 0
confrelid | 0
confupdtype   |
confdeltype   |
confmatchtype |
conislocal| t
coninhcount   | 0
connoinherit  | t
conkey| {1}
confkey   |
confreftype   |
conpfeqop |
conppeqop |
conffeqop |
conexclop |
conbin|

I therefore think is_opt should be changed to true for this row:
fktable|   fkcols|   pktable|  pkcols   | 
is_array | is_opt
---+-+--+---+--+
pg_constraint | {confrelid,confkey} | pg_attribute | {attrelid,attnum} | t  
  | f

If this is fixed, I also agree this is ready to be committed.

/Joel

Re: Recording foreign key relationships for the system catalogs

2021-02-01 Thread Tom Lane
"Joel Jacobson"  writes:
> The is_array OUT parameter doesn't say which of the possibly many fkcols that 
> is the array column.

Yeah, I didn't write the sgml docs yet, but the comments explain that
the array is always the last fkcol.  Maybe someday that won't be
general enough, but we can cross that bridge when we come to it.

regards, tom lane




Re: Recording foreign key relationships for the system catalogs

2021-02-01 Thread Joel Jacobson
On Mon, Feb 1, 2021, at 20:33, Joel Jacobson wrote:
>Suggestions on how to fix:
>
>* Make is_array an boolean[], and let each element represent the is_array 
>value for each fkcols element.
>
>* Change interface to be more like information_schema, and add a 
>"ordinal_position" column, and return each column on a separate row.
>
>I think I prefer the latter since it's more information_schema-conformant, but 
>any works.

Ops. I see a problem with just adding a "ordinal_position", since then one 
would also need enumerate the "foreing key" constraints or give them names like 
"constraint_name" in information_schema.table_constraints (since there can be 
multiple foreign keys per fktable, so there would be multiple e.g. 
ordinal_position=1 per fktable).

With this into consideration, I think the easiest and cleanest solution is to 
make is_array a boolean[].

I like the usage of arrays, it makes it much easier to understand the output,
as it's the visually easy to see what groups of columns that references what 
group of columns.

/Joel

Re: Recording foreign key relationships for the system catalogs

2021-02-01 Thread Joel Jacobson
Hi again,

After trying to use pg_get_catalog_foreign_keys() to replace what I had before,
I notice one ambiguity which I think is a serious problem in the 
machine-readable context.

The is_array OUT parameter doesn't say which of the possibly many fkcols that 
is the array column.

One example:

   fktable|fkcols |   pktable|  pkcols  
 | is_array
--+---+--+---+--
pg_constraint| {conrelid,conkey} | pg_attribute | {attrelid,attnum} 
| t

Is the array "conrelid" or is it "conkey"? As a human, I know it's "conkey", 
but for a machine to figure out, one would need to join 
information_schema.columns and check the data_type or something similar.

Suggestions on how to fix:

* Make is_array an boolean[], and let each element represent the is_array value 
for each fkcols element.

* Change interface to be more like information_schema, and add a 
"ordinal_position" column, and return each column on a separate row.

I think I prefer the latter since it's more information_schema-conformant, but 
any works.

/Joel


Re: Recording foreign key relationships for the system catalogs

2021-02-01 Thread Joel Jacobson
Could it be an idea to also add

   OUT can_be_zero boolean

to pg_get_catalog_foreign_keys()'s out parameters?

This information is useful to know if one should be doing an INNER JOIN or a 
LEFT JOIN on the foreign keys.

The information is mostly available in the documentation already,
but not quite accurate, which I've proposed a patch [1] to fix.

[1] 
https://www.postgresql.org/message-id/4ed9a372-7bf9-479a-926c-ae8e77471...@www.fastmail.com


Re: Recording foreign key relationships for the system catalogs

2021-02-01 Thread Joel Jacobson
Very nice. Thanks to this patch, I can get rid of my own parse-catalogs.pl hack 
and use pg_get_catalog_foreign_keys() instead.

+1

I can with high confidence assert the correctness of 
pg_get_catalog_foreign_keys()'s output,
as it matches the lookup tables for the tool I'm hacking on precisely:

--
-- verify single column foreign keys
--
WITH
a AS (
  SELECT
fktable::text,
fkcols[1]::text,
pktable::text,
pkcols[1]::text
  FROM pg_get_catalog_foreign_keys()
  WHERE cardinality(fkcols) = 1
),
b AS (
  SELECT
table_name,
column_name,
ref_table_name,
ref_column_name
  FROM pit.oid_joins
)
SELECT
  (SELECT COUNT(*) FROM (SELECT * FROM a EXCEPT SELECT * FROM b) AS x) AS 
except_b,
  (SELECT COUNT(*) FROM (SELECT * FROM b EXCEPT SELECT * FROM a) AS x) AS 
except_a,
  (SELECT COUNT(*) FROM (SELECT * FROM b INTERSECT SELECT * FROM a) AS x) AS 
a_intersect_b
;

except_b | except_a | a_intersect_b
--+--+---
0 |0 |   209
(1 row)

--
-- verify multi-column foreign keys
--
WITH
a AS (
  SELECT
fktable::text,
fkcols,
pktable::text,
pkcols
  FROM pg_get_catalog_foreign_keys()
  WHERE cardinality(fkcols) > 1
),
b AS (
  SELECT
table_name,
ARRAY[rel_column_name,attnum_column_name],
'pg_attribute',
'{attrelid,attnum}'::text[]
  FROM pit.attnum_joins
)
SELECT
  (SELECT COUNT(*) FROM (SELECT * FROM a EXCEPT SELECT * FROM b) AS x) AS 
except_b,
  (SELECT COUNT(*) FROM (SELECT * FROM b EXCEPT SELECT * FROM a) AS x) AS 
except_a,
  (SELECT COUNT(*) FROM (SELECT * FROM b INTERSECT SELECT * FROM a) AS x) AS 
a_intersect_b
;

except_b | except_a | a_intersect_b
--+--+---
0 |0 | 8
(1 row)

/Joel

On Sun, Jan 31, 2021, at 22:39, Tom Lane wrote:
> Now that dfb75e478 is in, I looked into whether we can have some
> machine-readable representation of the catalogs' foreign key
> relationships.  As per the previous discussion [1], it's not
> practical to have "real" SQL foreign key constraints, because
> the semantics we use aren't quite right (i.e., using 0 instead
> of NULL in rows with no reference).  Nonetheless it would be
> nice to have the knowledge available in some form, and we agreed
> that a set-returning function returning the data would be useful.
> 
> The attached patch creates that function, and rewrites the oidjoins.sql
> regression test to use it, in place of the very incomplete info that's
> reverse-engineered by findoidjoins.  It's mostly straightforward.
> 
> My original thought had been to add DECLARE_FOREIGN_KEY() macros
> for all references, but I soon realized that in a large majority of
> the cases, that's redundant with the BKI_LOOKUP() annotations we
> already have.  So I taught genbki.pl to extract FK data from
> BKI_LOOKUP() as well as the explicit DECLARE macros.  That didn't
> remove the work entirely, because it turned out that we hadn't
> bothered to apply BKI_LOOKUP() labels to most of the catalogs that
> have no hand-made data.  A big chunk of the patch consists in
> adding those as needed.  Also, I had to make the BKI_LOOKUP()
> mechanism a little more complete, because it failed on pg_namespace
> and pg_authid references.  (It will still fail on some other
> cases such as BKI_LOOKUP(pg_foreign_server), but I think there's
> no need to fill that in until/unless we have some built-in data
> that needs it.)
> 
> There are various loose ends yet to be cleaned up:
> 
> * I'm unsure whether it's better for the SRF to return the
> column names as textual names, or as column numbers.  Names was
> a bit easier for all the parts of the current patch so I did
> it that way, but maybe there's a case for the other way.
> Actually the whole API for the SRF is just spur-of-the-moment,
> so maybe a different API would be better.
> 
> * It would now be possible to remove the PGNSP and PGUID kluges
> entirely in favor of plain BKI_LOOKUP references to pg_namespace
> and pg_authid.  The catalog header usage would get a little
> more verbose: BKI_DEFAULT(PGNSP) becomes BKI_DEFAULT(pg_catalog)
> and BKI_DEFAULT(PGUID) becomes BKI_DEFAULT(POSTGRES).  I'm a bit
> inclined to do it, simply to remove one bit of mechanism that has
> to be documented; but it's material for a separate patch perhaps.
> 
> * src/tools/findoidjoins should be nuked entirely, AFAICS.
> Again, that could be a follow-on patch.
> 
> * I've not touched the SGML docs.  Certainly
> pg_get_catalog_foreign_keys() should be documented, and some
> adjustments in bki.sgml might be appropriate.
> 
> regards, tom lane
> 
> [1] 
> https://www.postgresql.org/message-id/flat/dc5f44d9-5ec1-a596-0251-dadadcdede98%402ndquadrant.com
> 
> 
> 
> *Attachments:*
>  * add-catalog-foreign-key-info-1.patch

Kind regards,

Joel