subject:"WIP\: a way forward on bootstrap data"

Re: WIP: a way forward on bootstrap data

2018-03-02 Thread Tom Lane

John Naylor  writes:
> Version 8, rebased against 76b6aa41f41d.

I took a preliminary look through this, without yet attempting to execute
the script against HEAD.  I have a few thoughts:

* I'm inclined not to commit the conversion scripts to the repo.  I doubt
there are third parties out there with a need for them, and if they do
need them they can get 'em out of this thread in the mailing list
archives.  (If anyone has a different opinion about that, speak up!)

* I notice you have a few "preliminary cleanup" changes here and there
in the series, such as fixing the inconsistent spelling of
Anum_pg_init_privs_initprivs.  These could be applied before we start
the main conversion process, and I'm inclined to do that since we could
get 'em out of the way early.  Ideally, the main conversion would only
touch the header files and related scripts/makefiles.

* I'm a little disturbed by the fact that 0002 has to "re-doublequote
values that are macros expanded by initdb.c".  I see that there are only
a small number of affected places, so maybe it's not really worth working
harder, but I worry that something might get missed.  Is there any way to
include this consideration in the automated conversion, or at least to
verify that we found all the places to quote?  Or, seeing that 0004 seems
to be introducing some quoting-related hacks to genbki.pl anyway, maybe
we could take care of the issue there?

* In 0003, I'd recommend leaving the re-indentation to happen in the next
perltidy run (assuming perltidy would fix that, which I think is true but
I might be wrong).  It's just creating more review work to do it here.
In any case, the patch summary line is pretty misleading since it's
*not* just reindenting, but also refactoring genbki.pl.  (BTW, if that
refactoring would work on the script as it is, maybe that's another
thing we could do early?  The more we can do before "flag day", the
better IMO.)

* In 0006, I'm not very pleased with the introduction of
"Makefile.headers".  I'd keep those macros where they are in
catalog/Makefile.  backend/Makefile doesn't need to know about that,
especially since it's doing an unconditional invocation of
catalog/Makefile anyway.  It could just do something like

submake-schemapg:
$(MAKE) -C catalog generated-headers

and leave it to catalog/Makefile to know what needs to happen for
both schemapg.h and the other generated files.

Overall, though, this is looking pretty promising.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-03 Thread John Naylor

Thanks for taking a look.

On 3/3/18, Tom Lane  wrote:
> John Naylor  writes:
>> Version 8, rebased against 76b6aa41f41d.
>
> I took a preliminary look through this, without yet attempting to execute
> the script against HEAD.  I have a few thoughts:
>
> * I'm inclined not to commit the conversion scripts to the repo.  I doubt
> there are third parties out there with a need for them, and if they do
> need them they can get 'em out of this thread in the mailing list
> archives.  (If anyone has a different opinion about that, speak up!)

If no one feels strongly otherwise, I'll just attach the conversion
script along with the other patches for next version. To be clear, the
rewrite script is intended be committed, both to enforce formatting
and as a springboard for bulk editing. Now, whether that belongs in
/src/include/catalog or /src/tools is debatable.

> * I'm a little disturbed by the fact that 0002 has to "re-doublequote
> values that are macros expanded by initdb.c".  I see that there are only
> a small number of affected places, so maybe it's not really worth working
> harder, but I worry that something might get missed.  Is there any way to
> include this consideration in the automated conversion, or at least to
> verify that we found all the places to quote?  Or, seeing that 0004 seems
> to be introducing some quoting-related hacks to genbki.pl anyway, maybe
> we could take care of the issue there?

The quoting hacks are really to keep the postgres.bki diff as small as
possible (attached). The simplest and most air-tight way to address
your concern would be to double-quote everything when writing the bki
file. That could be done last as a follow-up.

> * I notice you have a few "preliminary cleanup" changes here and there
> in the series, such as fixing the inconsistent spelling of
> Anum_pg_init_privs_initprivs.  These could be applied before we start
> the main conversion process, and I'm inclined to do that since we could
> get 'em out of the way early.  Ideally, the main conversion would only
> touch the header files and related scripts/makefiles.
...
> * In 0003, I'd recommend leaving the re-indentation to happen in the next
> perltidy run (assuming perltidy would fix that, which I think is true but
> I might be wrong).  It's just creating more review work to do it here.
> In any case, the patch summary line is pretty misleading since it's
> *not* just reindenting, but also refactoring genbki.pl.  (BTW, if that
> refactoring would work on the script as it is, maybe that's another
> thing we could do early?  The more we can do before "flag day", the
> better IMO.)

I tested perltidy 20090616 and it handles it fine. I'll submit a
preliminary patch soon to get some of those items out of the way.

> * In 0006, I'm not very pleased with the introduction of
> "Makefile.headers".  I'd keep those macros where they are in
> catalog/Makefile.  backend/Makefile doesn't need to know about that,
> especially since it's doing an unconditional invocation of
> catalog/Makefile anyway.  It could just do something like
>
> submake-schemapg:
>   $(MAKE) -C catalog generated-headers
>
> and leave it to catalog/Makefile to know what needs to happen for
> both schemapg.h and the other generated files.

I wasn't happy with it either, but I couldn't get it to build
otherwise. The sticking point was the symlinks in
$(builddir)/src/include/catalog.  $(MAKE) -C catalog doesn't handle
that. The makefile in /src/common relies on the backend makefile to
know what to invoke for a given header. IIRC, relpath.c includes
pg_tablespace.h, which now requires pg_tablespace_d.h to be built.

Perhaps /src/common/Makefile could invoke the catalog makefile
directly, and the pg_*_d.h headers could be written to
$(builddir)/src/include/catalog directly? I'll hack on it some more.

> Overall, though, this is looking pretty promising.
>
>   regards, tom lane
>

Glad to hear it.

-John Naylor
--- postgres.bki.orig   2018-02-25 15:58:50.082261557 +0700
+++ postgres.bki.noquotes   2018-02-25 16:12:39.967498748 +0700
@@ -5651,9 +5651,9 @@
  lanacl = aclitem[]
  )
 open pg_language
-insert OID = 12 ( "internal" 10 f f 0 0 2246 _null_ )
-insert OID = 13 ( "c" 10 f f 0 0 2247 _null_ )
-insert OID = 14 ( "sql" 10 f t 0 0 2248 _null_ )
+insert OID = 12 ( internal 10 f f 0 0 2246 _null_ )
+insert OID = 13 ( c 10 f f 0 0 2247 _null_ )
+insert OID = 14 ( sql 10 f t 0 0 2248 _null_ )
 close pg_language
 create pg_largeobject_metadata 2995
  (
@@ -5753,8 +5753,8 @@
 insert ( 2798 n 0 2796 - 2796 - - - - - f f r r 2799 27 0 0 0 _null_ _null_ )
 insert ( 3527 n 0 3524 - 3524 - - - - - f f r r 3518 3500 0 0 0 _null_ _null_ )
 insert ( 3565 n 0 3563 - 3563 - - - - - f f r r 1203 869 0 0 0 _null_ _null_ )
-insert ( 2147 n 0 2804 - 463 - - 2804 3547 - f f r r 0 20 0 20 0 "0" "0" )
-insert ( 2803 n 0 1219 - 463 - - 1219 3546 - f f r r 0 20 0 20 0 "0" "0" )
+insert ( 2147 n 0 2804 - 463 - - 2804 3547 - f f r r 0 20 0 20 0 0 0 )
+in

Re: WIP: a way forward on bootstrap data

2018-03-03 Thread John Naylor

I wrote:

> I'll submit a
> preliminary patch soon to get some of those items out of the way.

I've attached a patch that takes care of these cleanups so they don't
clutter the patch set.

-John Naylor
From d97a8b2e5fa4977e656d1aca7ee7bf1289ecbd40 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 3 Mar 2018 16:56:10 +0700
Subject: [PATCH] Some non-functional cleanups in preparation for the upcoming
 bootstrap data conversion:

Arrange pg_tablespace.h OID symbols so they are immediately after the relevant
DATA() line.

Separate out the pg_attribute logic of genbki.pl into its own function
and skip checking if the data is defined. This both narrows and shortens
the data writing loop of the script.

Correct spelling of some macros.
---
 src/backend/catalog/aclchk.c  |   6 +-
 src/backend/catalog/genbki.pl | 272 ++
 src/include/catalog/pg_init_privs.h   |   2 +-
 src/include/catalog/pg_tablespace.h   |   3 +-
 src/interfaces/ecpg/ecpglib/execute.c |   2 +-
 src/interfaces/ecpg/ecpglib/pg_type.h |   2 +-
 6 files changed, 149 insertions(+), 138 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 3f2c629..0648539 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -5969,8 +5969,8 @@ recordExtensionInitPrivWorker(Oid objoid, Oid classoid, int objsubid, Acl *new_a
 			MemSet(nulls, false, sizeof(nulls));
 			MemSet(replace, false, sizeof(replace));
 
-			values[Anum_pg_init_privs_privs - 1] = PointerGetDatum(new_acl);
-			replace[Anum_pg_init_privs_privs - 1] = true;
+			values[Anum_pg_init_privs_initprivs - 1] = PointerGetDatum(new_acl);
+			replace[Anum_pg_init_privs_initprivs - 1] = true;
 
 			oldtuple = heap_modify_tuple(oldtuple, RelationGetDescr(relation),
 		 values, nulls, replace);
@@ -6007,7 +6007,7 @@ recordExtensionInitPrivWorker(Oid objoid, Oid classoid, int objsubid, Acl *new_a
 			values[Anum_pg_init_privs_privtype - 1] =
 CharGetDatum(INITPRIVS_EXTENSION);
 
-			values[Anum_pg_init_privs_privs - 1] = PointerGetDatum(new_acl);
+			values[Anum_pg_init_privs_initprivs - 1] = PointerGetDatum(new_acl);
 
 			tuple = heap_form_tuple(RelationGetDescr(relation), values, nulls);
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index ed90a02..8d740c3 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -156,154 +156,85 @@ foreach my $catname (@{ $catalogs->{names} })
 		print $bki "open $catname\n";
 	}
 
-	if (defined $catalog->{data})
+	# For pg_attribute.h, we generate data entries ourselves.
+	# NB: pg_type.h must come before pg_attribute.h in the input list
+	# of catalog names, since we use info from pg_type.h here.
+	if ($catname eq 'pg_attribute')
 	{
+		gen_pg_attribute($schema);
+	}
 
-		# Ordinary catalog with DATA line(s)
-		foreach my $row (@{ $catalog->{data} })
-		{
-
-			# Split line into tokens without interpreting their meaning.
-			my %bki_values;
-			@bki_values{@attnames} =
-			  Catalog::SplitDataLine($row->{bki_values});
-
-			# Perform required substitutions on fields
-			foreach my $column (@$schema)
-			{
-my $attname = $column->{name};
-my $atttype = $column->{type};
-
-# Substitute constant values we acquired above.
-# (It's intentional that this can apply to parts of a field).
-$bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
-$bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
-
-# Replace regproc columns' values with OIDs.
-# If we don't have a unique value to substitute,
-# just do nothing (regprocin will complain).
-if ($atttype eq 'regproc')
-{
-	my $procoid = $regprocoids{ $bki_values{$attname} };
-	$bki_values{$attname} = $procoid
-	  if defined($procoid) && $procoid ne 'MULTIPLE';
-}
-			}
+	# Ordinary catalog with DATA line(s)
+	foreach my $row (@{ $catalog->{data} })
+	{
 
-			# Save pg_proc oids for use in later regproc substitutions.
-			# This relies on the order we process the files in!
-			if ($catname eq 'pg_proc')
-			{
-if (defined($regprocoids{ $bki_values{proname} }))
-{
-	$regprocoids{ $bki_values{proname} } = 'MULTIPLE';
-}
-else
-{
-	$regprocoids{ $bki_values{proname} } = $row->{oid};
-}
-			}
+		# Split line into tokens without interpreting their meaning.
+		my %bki_values;
+		@bki_values{@attnames} =
+		  Catalog::SplitDataLine($row->{bki_values});
 
-			# Save pg_type info for pg_attribute processing below
-			if ($catname eq 'pg_type')
+		# Perform required substitutions on fields
+		foreach my $column (@$schema)
+		{
+			my $attname = $column->{name};
+			my $atttype = $column->{type};
+
+			# Substitute constant values we acquired above.
+			# (It's intentional that this can apply to parts of a field).
+			$bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
+			$bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
+
+			# Replace

Re: WIP: a way forward on bootstrap data

2018-03-03 Thread Tom Lane

John Naylor  writes:
> On 3/3/18, Tom Lane  wrote:
>> * I'm a little disturbed by the fact that 0002 has to "re-doublequote
>> values that are macros expanded by initdb.c".  I see that there are only
>> a small number of affected places, so maybe it's not really worth working
>> harder, but I worry that something might get missed.  Is there any way to
>> include this consideration in the automated conversion, or at least to
>> verify that we found all the places to quote?  Or, seeing that 0004 seems
>> to be introducing some quoting-related hacks to genbki.pl anyway, maybe
>> we could take care of the issue there?

> The quoting hacks are really to keep the postgres.bki diff as small as
> possible (attached). The simplest and most air-tight way to address
> your concern would be to double-quote everything when writing the bki
> file. That could be done last as a follow-up.

Oh, if you're cross-checking by diff'ing the produced .bki file, then
that's sufficient to address my concern here.  No need to do more.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-03 Thread Tom Lane

John Naylor  writes:
> I've attached a patch that takes care of these cleanups so they don't
> clutter the patch set.

Pushed.  I made a couple of cosmetic changes in genbki.pl.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-03 Thread Tom Lane

John Naylor  writes:
> On 3/3/18, Tom Lane  wrote:
>> * In 0006, I'm not very pleased with the introduction of
>> "Makefile.headers".

> I wasn't happy with it either, but I couldn't get it to build
> otherwise. The sticking point was the symlinks in
> $(builddir)/src/include/catalog.  $(MAKE) -C catalog doesn't handle
> that. The makefile in /src/common relies on the backend makefile to
> know what to invoke for a given header. IIRC, relpath.c includes
> pg_tablespace.h, which now requires pg_tablespace_d.h to be built.

I'm not following.  AFAICS, what you put in src/common/Makefile was just

+.PHONY: generated-headers
+
+generated-headers:
+   $(MAKE) -C ../backend generated-headers

which doesn't appear to care whether backend/Makefile knows anything
about specific generated headers or not.  I think all we need to do
is consider that the *_d.h files ought to be built as another consequence
of invoking the generated-headers target.

BTW, there's already a submake-generated-headers target in
Makefile.global, which you should use in preference to rolling your own.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-14 Thread Tom Lane

John Naylor  writes:
> It didn't take that long to rebase the remaining parts of the
> patchset, so despite what I said above I went ahead and put them in
> version 10 (attached), this time via scripted bulk editing rather than
> as large patches.

Starting to look into this version now, but a small suggestion while
it's still fresh in mind: it might be easier, in future rounds, to
put all the files in a tarball and attach 'em as one big attachment.
At least with my mail setup, it's way easier to save off a tarball
and "tar xf" it than it is to individually save a dozen attachments.
I suspect that way might be easier on your end, too.

There's some value in posting a patchset as separate attachments
when it's possible to just apply the patches in series; Munro's patch
tester knows what to do with that, but not with a tarball AFAIK.
But in this case, there's little hope that the patch tester would
get it right anyhow.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-15 Thread John Naylor

On 3/15/18, Tom Lane  wrote:
> John Naylor  writes:
>> It didn't take that long to rebase the remaining parts of the
>> patchset, so despite what I said above I went ahead and put them in
>> version 10 (attached), this time via scripted bulk editing rather than
>> as large patches.
>
> Starting to look into this version now, but a small suggestion while
> it's still fresh in mind: it might be easier, in future rounds, to
> put all the files in a tarball and attach 'em as one big attachment.

Sure thing. I've done so here for version 11, which is just a rebase
over the removal of pg_class.relhaspkey.

-John Naylor


v11-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: WIP: a way forward on bootstrap data

2018-03-21 Thread Tom Lane

John Naylor  writes:
> [ v11-bootstrap-data-conversion.tar.gz ]

I've done a round of review work on this, focusing on the Makefile
infrastructure.  I found a bunch of problems with parallel builds and
VPATH builds, which are addressed in the attached incremental patch.

The parallel-build issues are a bit of a mess really: it's surprising
we've not had problems there earlier.  For instance, catalog/Makefile has

postgres.description: postgres.bki ;
postgres.shdescription: postgres.bki ;
schemapg.h: postgres.bki ;

However, genbki.pl doesn't make any particular guarantee that postgres.bki
will be written sooner than its other output files, which means that in
principle make might think it needs to rebuild these other files on every
subsequent check.  That was somewhat harmless given the empty update rule,
but it's not really the right thing.  Your patch extended this to make
all the generated headers dependent on postgres.bki, and those are
definitely written before postgres.bki, meaning make *will* think they
are out of date.  Worse, it'll also think the symlinks to them are out
of date.  So I was running into problems with different parallel make
sub-tasks removing and recreating the symlinks.  VPATH builds didn't
work well either, because out-of-date-ness ties into whether make will
accept a file in the source dir as a valid replacement target.

I resolved this mess by setting up a couple of stamp files, which is
a technique we also use elsewhere.  bki-stamp is a single file
representing the action of having run genbki.pl, and header-stamp
likewise represents the action of having made the symlinks to the
generated headers.  By depending on those rather than individual
files, we avoid questions of exactly what the timestamps on the
individual output files are.

In the attached, I've also done some more work on the README file
and cleaned up a few other little things.

I've not really looked at the MSVC build code at all.  Personally,
I'm willing to just commit this (when the time comes) and let the
buildfarm see if the MSVC code works ... but if anyone else wants
to check that part beforehand, please do.

I also have not spent much time yet looking at the end-product .h and .dat
files.  I did note a bit of distressing inconsistency in the formatting of
the catalog struct declarations, some of which predates this patch but it
seems like you've introduced more.  I think what we ought to standardize
on is a format similar to this in pg_opclass.h:

CATALOG(pg_opclass,2616)
{
/* index access method opclass is for */
Oid opcmethod   BKI_LOOKUP(pg_am);

/* name of this opclass */
NameDataopcname;

/* namespace of this opclass */
Oid opcnamespaceBKI_DEFAULT(PGNSP);

/* opclass owner */
Oid opcownerBKI_DEFAULT(PGUID);

The former convention used in some places, of field descriptions in
same-line comments, clearly won't work anymore if we're sticking
BKI_DEFAULT annotations there.  I also don't like the format used in, eg,
pg_aggregate.h of putting field descriptions in a separate comment block
before the struct proper.  Bitter experience has shown that there are a
lot of people on this project who won't update comments that are more than
about two lines away from the code they change; so the style in
pg_aggregate.h is just inviting maintenance oversights.

I've got mixed feelings about the whitespace lines between fields.  They
seem like they are mostly bulking up the code and we could do without 'em.
On the other hand, pgindent will insist on putting one before any
multi-line field comment, and so that would create inconsistent formatting
if we don't use 'em elsewhere.  Thoughts?

Speaking of pgindent, those prettily aligned BKI annotations are a waste
of effort, because when pgindent gets done with the code it will look
like

regproc aggfnoid;
charaggkind BKI_DEFAULT(n);
int16   aggnumdirectargs BKI_DEFAULT(0);
regproc aggtransfn BKI_LOOKUP(pg_proc);
regproc aggfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggcombinefn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggdeserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

I'm not sure if there's anything much to be done about this.  BKI_DEFAULT
isn't so bad, but additional annotations get unreadable fast.  Maybe
BKI_LOOKUP was a bad idea after all, and we should just invent more
Oid-equivalent typedef names.

The attached is just one incremental patch on top of your v11 series.
I couldn't think of an easy way to migrate the changes back into the
most relevant diffs of your series, so I didn'

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread John Naylor

On 3/21/18, Tom Lane  wrote:
> John Naylor  writes:
>> [ v11-bootstrap-data-conversion.tar.gz ]
>
> I've done a round of review work on this, focusing on the Makefile
> infrastructure.  I found a bunch of problems with parallel builds and
> VPATH builds, which are addressed in the attached incremental patch.
>
[explanation of Make issues and stamp files]

> In the attached, I've also done some more work on the README file
> and cleaned up a few other little things.

Thanks for pulling my attempt at makefile hackery across the finish
line. It sounds like there are no more obvious structural issues
remaining (fingers crossed). Your other improvements make sense.

>  I did note a bit of distressing inconsistency in the formatting of
> the catalog struct declarations, some of which predates this patch but it
> seems like you've introduced more.  I think what we ought to standardize
> on is a format similar to this in pg_opclass.h:
>
> CATALOG(pg_opclass,2616)
> {
> /* index access method opclass is for */
> Oid opcmethod   BKI_LOOKUP(pg_am);
>
[snip]

That is the most sensible format. Did you mean all 62 catalog headers
for future-proofing, or just the ones with annotations now?

> The former convention used in some places, of field descriptions in
> same-line comments, clearly won't work anymore if we're sticking
> BKI_DEFAULT annotations there.

Yeah.

> I also don't like the format used in, eg,
> pg_aggregate.h of putting field descriptions in a separate comment block
> before the struct proper.  Bitter experience has shown that there are a
> lot of people on this project who won't update comments that are more than
> about two lines away from the code they change; so the style in
> pg_aggregate.h is just inviting maintenance oversights.

Okay.

> I've got mixed feelings about the whitespace lines between fields.  They
> seem like they are mostly bulking up the code and we could do without 'em.
> On the other hand, pgindent will insist on putting one before any
> multi-line field comment, and so that would create inconsistent formatting
> if we don't use 'em elsewhere.  Thoughts?

I'll do it both ways for one header and post the results for people to look at.

> Speaking of pgindent, those prettily aligned BKI annotations are a waste
> of effort, because when pgindent gets done with the code it will look
> like
>
[snip]
>   regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>   regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>   regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

> I'm not sure if there's anything much to be done about this.  BKI_DEFAULT
> isn't so bad, but additional annotations get unreadable fast.  Maybe
> BKI_LOOKUP was a bad idea after all, and we should just invent more
> Oid-equivalent typedef names.

Well, until version 7, I used "fake" type aliases that only genbki.pl
could see. The C compiler and postgres.bki could only see the actual
Oid/oid type. Perhaps it was a mistake to model their appearance after
regproc (regtype, etc), because that was misleading. Maybe something
more obviously transient like 'lookup_typeoid? I'm leaning towards
this idea.

Another possibility is to teach the pgindent pre_/post_indent
functions to preserve annotation formatting, but I'd rather not add
yet another regex to that script. Plus, over the next 10+ years, I
could see people adding several more BKI_* macros, leading to
readability issues regardless of formatting, so maybe we should nip
this one in the bud.

> The attached is just one incremental patch on top of your v11 series.
> I couldn't think of an easy way to migrate the changes back into the
> most relevant diffs of your series, so I didn't try.

I've done that quite a few times while developing this patch series,
so I'm used to it. I'll incorporate your changes soon and also rebase
over the new pg_class column that landed recently. I'll have a new
version by this weekend, assuming we conclude the formatting
discussion, so if you or others have any more comments by then, I'll
include them.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread Tom Lane

John Naylor  writes:
> On 3/21/18, Tom Lane  wrote:
>>  regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>>  regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>>  regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

>> I'm not sure if there's anything much to be done about this.  BKI_DEFAULT
>> isn't so bad, but additional annotations get unreadable fast.  Maybe
>> BKI_LOOKUP was a bad idea after all, and we should just invent more
>> Oid-equivalent typedef names.

> Well, until version 7, I used "fake" type aliases that only genbki.pl
> could see. The C compiler and postgres.bki could only see the actual
> Oid/oid type. Perhaps it was a mistake to model their appearance after
> regproc (regtype, etc), because that was misleading. Maybe something
> more obviously transient like 'lookup_typeoid? I'm leaning towards
> this idea.

Looking at this again, I think a big chunk of the readability problem here
is just from the fact that we have long, similar-looking lines tightly
packed.  If it were reformatted to have comment lines and whitespace
between, it might not look nearly as bad.

> Another possibility is to teach the pgindent pre_/post_indent
> functions to preserve annotation formatting, but I'd rather not add
> yet another regex to that script. Plus, over the next 10+ years, I
> could see people adding several more BKI_* macros, leading to
> readability issues regardless of formatting, so maybe we should nip
> this one in the bud.

Well, whether or not we invent BKI_LOOKUP, the need for other kinds
of annotations isn't likely to be lessened.

I wondered whether we could somehow convert the format into multiple
lines, say

regproc aggmfinalfn
BKI_DEFAULT(-)
BKI_LOOKUP(pg_proc);

but some quick experimentation was discouraging: either Emacs' C
syntax mode, or pgindent, or both would make a hash of it.  It
wasn't great from the vertical-space-consumption standpoint either.

Anyway, for the moment I'd stick with BKI_LOOKUP rather than undoing
that work.  I think it's a more transparent way of saying what we
want than the magic-OID-typedefs approach.  The formatting issue is
just a mild annoyance, and it's not really BKI_LOOKUP's fault anyway.

While I'm thinking of it --- I noticed one or two places where you
had "BKI_DEFAULT(\0)".  That coding scares me a bit --- gcc seems to
tolerate it, but other C compilers might feel that \0 is not a valid
preprocessing token, or it might confuse some editors' syntax highlight
rules.  I'd rather write cases like this as "BKI_DEFAULT('\0')".  IOW,
the argument should be a valid C identifier, number, or string literal.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread John Naylor

I wrote:

> On 3/21/18, Tom Lane  wrote:
>> I've got mixed feelings about the whitespace lines between fields.  They
>> seem like they are mostly bulking up the code and we could do without
>> 'em.
>> On the other hand, pgindent will insist on putting one before any
>> multi-line field comment, and so that would create inconsistent
>> formatting
>> if we don't use 'em elsewhere.  Thoughts?
>
> I'll do it both ways for one header and post the results for people to look
> at.

I've attached an earlier version of pg_proc.h with both formats as I
understand them. I turned a couple comments into multi-line comments
to demonstrate. I think without spaces it's just as hard to read as
with multiple annotations. I'd vote for spaces, but then again I'm not
the one who has to read these things very often.

-John Naylor
/*-
 *
 * pg_proc.h
 *	  definition of the system "procedure" relation (pg_proc)
 *	  along with the relation's initial contents.
 *
 * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 * src/include/catalog/pg_proc.h
 *
 * NOTES
 *	  The Catalog.pm module reads this file and derives schema
 *	  information.
 *
 *-
 */
#ifndef PG_PROC_H
#define PG_PROC_H

#include "catalog/genbki.h"

/* 
 *		pg_proc definition.  cpp turns this into
 *		typedef struct FormData_pg_proc
 * 
 */
#define ProcedureRelationId  1255
#define ProcedureRelation_Rowtype_Id  81

CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
{
	/* procedure name */
	NameData	proname;
	/* OID of namespace containing this proc */
	Oid			pronamespace BKI_DEFAULT(PGNSP);
	/* procedure owner */
	Oid			proowner BKI_DEFAULT(PGUID);
	/* OID of pg_language entry */
	Oid			prolang BKI_DEFAULT(12);
	/* estimated execution cost */
	float4		procost BKI_DEFAULT(1);

	/* estimated # of 
	 * rows out (if proretset) */
	float4		prorows BKI_DEFAULT(0);
	/* element type of variadic array, or 0 */
	Oid			provariadic BKI_DEFAULT(0);
	/* transforms calls to it during planning */
	regproc		protransform BKI_DEFAULT(0);
	/* is it an aggregate? */
	bool		proisagg BKI_DEFAULT(f);
	/* is it a window function? */
	bool		proiswindow BKI_DEFAULT(f);
	/* security definer */
	bool		prosecdef BKI_DEFAULT(f);
	/* is it a leak-proof function? */
	bool		proleakproof BKI_DEFAULT(f);
	/* strict with respect to NULLs? */
	bool		proisstrict BKI_DEFAULT(f);
	/* returns a set? */
	bool		proretset BKI_DEFAULT(f);
	/* see PROVOLATILE_ categories below */
	char		provolatile BKI_DEFAULT(v);
	/* see PROPARALLEL_ categories below */
	char		proparallel BKI_DEFAULT(u);
	/* number of arguments */
	int16		pronargs;
	/* number of arguments with defaults */
	int16		pronargdefaults BKI_DEFAULT(0);
	/* OID of result type */
	Oid			prorettype;

	/*
	 * variable-length fields start here, but we allow direct access to
	 * proargtypes
	 */

	/* parameter types (excludes OUT params) */
	oidvector	proargtypes;

#ifdef CATALOG_VARLEN

	/* all param types (NULL if IN only) */
	Oid			proallargtypes[1] BKI_DEFAULT(_null_);
	/* parameter modes (NULL if IN only) */
	char		proargmodes[1] BKI_DEFAULT(_null_);
	/* parameter names (NULL if no names) */
	text		proargnames[1] BKI_DEFAULT(_null_);

	/* list of expression trees 
	 * for argument defaults (NULL if none) */
	pg_node_tree proargdefaults BKI_DEFAULT(_null_);
	/* types for which to apply transforms */
	Oid			protrftypes[1] BKI_DEFAULT(_null_);
	/* procedure source text */
	text		prosrc BKI_FORCE_NOT_NULL;
	/* secondary procedure info (can be NULL) */
	text		probin BKI_DEFAULT(_null_);
	/* procedure-local GUC settings */
	text		proconfig[1] BKI_DEFAULT(_null_);
	/* access permissions */
	aclitem		proacl[1] BKI_DEFAULT(_null_);
#endif
} FormData_pg_proc;

/* 
 *		Form_pg_proc corresponds to a pointer to a tuple with
 *		the format of pg_proc relation.
 * 
 */
typedef FormData_pg_proc *Form_pg_proc;

/* 
 *		compiler constants for pg_proc
 * 
 */
#define Natts_pg_proc	29
#define Anum_pg_proc_proname			1
#define Anum_pg_proc_pronamespace		2
#define Anum_pg_proc_proowner			3
#define Anum_pg_proc_prolang			4
#define Anum_pg_proc_procost			5
#define Anum_pg_proc_prorows			6
#define Anum_pg_proc_provariadic		7
#define Anum_pg_proc_protransform		8
#define Anum_pg_proc_proisagg			9
#define Anum_pg_proc_proiswindow		10
#define Anum_pg_proc_prosecdef			11
#define Anum_pg_proc_proleakproof		12
#define Anum_pg_proc_proisstrict		13
#define Anum_pg_proc_proretset			14
#define Anum_pg_proc_provolatile		15
#define Anum_pg_proc_proparallel		16
#define Anum_pg_proc_pronargs			17
#define Anum_pg_proc_pronargdefaults	18
#define Anum_pg_proc_prorettype			19
#define Anum_pg_proc_proargtypes		20
#define Anum_pg_proc_proallargty

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread John Naylor

On 3/22/18, Tom Lane  wrote:
> Looking at this again, I think a big chunk of the readability problem here
> is just from the fact that we have long, similar-looking lines tightly
> packed.  If it were reformatted to have comment lines and whitespace
> between, it might not look nearly as bad.
>
...
> Anyway, for the moment I'd stick with BKI_LOOKUP rather than undoing
> that work.  I think it's a more transparent way of saying what we
> want than the magic-OID-typedefs approach.  The formatting issue is
> just a mild annoyance, and it's not really BKI_LOOKUP's fault anyway.

Okay, I'll do it with comments and whitespace.

> While I'm thinking of it --- I noticed one or two places where you
> had "BKI_DEFAULT(\0)".  That coding scares me a bit --- gcc seems to
> tolerate it, but other C compilers might feel that \0 is not a valid
> preprocessing token, or it might confuse some editors' syntax highlight
> rules.  I'd rather write cases like this as "BKI_DEFAULT('\0')".  IOW,
> the argument should be a valid C identifier, number, or string literal.

Hmm, I only see this octal in pg_type.h

chartypdelim BKI_DEFAULT(\054);

Which I hope is fine. Were you thinking of this comment in
pg_attribute.h? We use the double-quoted empty string for postgres.bki
and change it to  '\0' for schemapg.h.

/* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
charattidentity BKI_DEFAULT("");


-John Naylor

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread Alvaro Herrera

John Naylor wrote:

> I've attached an earlier version of pg_proc.h with both formats as I
> understand them. I turned a couple comments into multi-line comments
> to demonstrate. I think without spaces it's just as hard to read as
> with multiple annotations. I'd vote for spaces, but then again I'm not
> the one who has to read these things very often.

how about letting the line go long, with the comment at the right of
each definition, with one blank line between struct members, as in the
sample below?  You normally don't care that these lines are too long
since you seldom edit them -- one mostly adds or remove entire lines
instead, so there's not as much need for side-by-side diffs as with
regular code.  (One issue with this proposal is how to convince pgindent
to leave the long lines alone.)

To me, an important property of these structs is fitting as much as
possible (vertically) in a screenful; the other proposed modes end up
with too many lines.

CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
{
NameDataproname;/* procedure name */

Oid pronamespace BKI_DEFAULT(PGNSP); /* OID of namespace containing 
this proc */

Oid proowner BKI_DEFAULT(PGUID); /* procedure owner */

Oid prolang BKI_DEFAULT(12); /* OID of pg_language entry */

float4  procost BKI_DEFAULT(1); /* estimated execution cost */

float4  prorows BKI_DEFAULT(0); /* estimated # of rows out (if 
proretset) */

Oid provariadic BKI_DEFAULT(0); /* element type of variadic array, 
or 0 */

regproc protransform BKI_DEFAULT(0); /* transforms calls to it during 
planning */

boolproisagg BKI_DEFAULT(f); /* is it an aggregate? */

boolproiswindow BKI_DEFAULT(f); /* is it a window function? */

boolprosecdef BKI_DEFAULT(f); /* security definer */

boolproleakproof BKI_DEFAULT(f); /* is it a leak-proof function? */

boolproisstrict BKI_DEFAULT(f); /* strict with respect to NULLs? */

boolproretset BKI_DEFAULT(f); /* returns a set? */

charprovolatile BKI_DEFAULT(v); /* see PROVOLATILE_ categories 
below */

charproparallel BKI_DEFAULT(u); /* see PROPARALLEL_ categories 
below */

int16   pronargs; /* number of arguments */

int16   pronargdefaults BKI_DEFAULT(0); /* number of arguments with 
defaults */

Oid prorettype; /* OID of result type */

/*
 * variable-length fields start here, but we allow direct access to
 * proargtypes
 */

oidvectorproargtypes; /* parameter types (excludes OUT params) */

#ifdef CATALOG_VARLEN

Oid proallargtypes[1] BKI_DEFAULT(_null_); /* all param types (NULL 
if IN only) */

charproargmodes[1] BKI_DEFAULT(_null_); /* parameter modes (NULL if 
IN only) */

textproargnames[1] BKI_DEFAULT(_null_); /* parameter names (NULL if 
no names) */

pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* list of expression 
trees for argument defaults (NULL if none) */

Oid protrftypes[1] BKI_DEFAULT(_null_); /* types for which to apply 
transforms */

textprosrc BKI_FORCE_NOT_NULL; /* procedure source text */

textprobin BKI_DEFAULT(_null_); /* secondary procedure info (can be 
NULL) */

textproconfig[1] BKI_DEFAULT(_null_); /* procedure-local GUC 
settings */

aclitem proacl[1] BKI_DEFAULT(_null_); /* access permissions */
#endif
} FormData_pg_proc;

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread Tom Lane

John Naylor  writes:
> On 3/22/18, Tom Lane  wrote:
>> While I'm thinking of it --- I noticed one or two places where you
>> had "BKI_DEFAULT(\0)".

> Hmm, I only see this octal in pg_type.h
> chartypdelim BKI_DEFAULT(\054);

Sorry, I was going by memory rather than looking at the code.

> Which I hope is fine.

I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054').

> Were you thinking of this comment in
> pg_attribute.h? We use the double-quoted empty string for postgres.bki
> and change it to  '\0' for schemapg.h.

> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
> char  attidentity BKI_DEFAULT("");

That definitely seems like a hack --- why not BKI_DEFAULT('\0') ?

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread Tom Lane

John Naylor  writes:
>> On 3/21/18, Tom Lane  wrote:
>>> I've got mixed feelings about the whitespace lines between fields.

> I've attached an earlier version of pg_proc.h with both formats as I
> understand them. I turned a couple comments into multi-line comments
> to demonstrate. I think without spaces it's just as hard to read as
> with multiple annotations. I'd vote for spaces, but then again I'm not
> the one who has to read these things very often.

Yeah, after looking at this example I agree --- it's too tight without
the blank lines.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread Tom Lane

Alvaro Herrera  writes:
> how about letting the line go long, with the comment at the right of
> each definition, with one blank line between struct members, as in the
> sample below?

> NameDataproname;/* procedure name */

> Oid pronamespace BKI_DEFAULT(PGNSP); /* OID of namespace 
> containing this proc */

> Oid proowner BKI_DEFAULT(PGUID); /* procedure owner */

I don't think this is going to work: pgindent is going to wrap most of
these comments, ending up with something that's ugly *and* consumes
just as much vertical space as if we'd given the comments their own
lines.  The problem is that in the headers where we were using
same-line comments, the comments were written to fit in the space
available without this extra annotation.  (For my money, having spent
lots of time shaving a character or two off such comments to make 'em
fit, I'd much prefer the luxury of having a whole line to write in.)

We could go with some scheme that preserves the old formatting of the
struct definition proper and puts the added info somewhere else, ie

Oid pronamespace;  /* OID of namespace containing this proc */

Oid prolang;   /* OID of pg_language entry */

then after the struct:

BKI_DEFAULT(pronamespace, PGNSP);
BKI_DEFAULT(prolang, 12);

but on the whole I don't think that's an improvement.  I'd rather keep
the info about a field together.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread John Naylor

On 3/22/18, Alvaro Herrera  wrote:
> how about letting the line go long, with the comment at the right of
> each definition, with one blank line between struct members, as in the
> sample below?  You normally don't care that these lines are too long
> since you seldom edit them -- one mostly adds or remove entire lines
> instead, so there's not as much need for side-by-side diffs as with
> regular code.  (One issue with this proposal is how to convince pgindent
> to leave the long lines alone.)

Yeah, it seems when perltidy or pgindent mangle things badly, it's to
try and shoehorn a long line into a smaller number of characters.  If
memory serves, I've come across things like this:

pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* list of expression

  trees for argument

  defaults (NULL if none) */

And thought "only a machine could be so precisely awkward"

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-03-22 Thread John Naylor

On 3/22/18, Tom Lane  wrote:
> I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054').

Okay, I'll do that.

>> Were you thinking of this comment in
>> pg_attribute.h? We use the double-quoted empty string for postgres.bki
>> and change it to  '\0' for schemapg.h.
>
>> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
>> char attidentity BKI_DEFAULT("");
>
> That definitely seems like a hack --- why not BKI_DEFAULT('\0') ?

Hmm, yes, the way I had it, the comment is a mystery. I'll switch it around.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-03-25 Thread John Naylor

On 3/21/18, Tom Lane  wrote:
> The attached is just one incremental patch on top of your v11 series.
> I couldn't think of an easy way to migrate the changes back into the
> most relevant diffs of your series, so I didn't try.

I've applied your changes to the v12 patch series (attached), but I
hope you'll allow two nit-picky adjustments:

-s/pg_XXX.h/pg_xxx.h/ in the README. There seems to be greater
precedent for the lower-case spelling if the rest of the word is lower
case.
-I shortened the data example in the README so it would comfortably
fit on two lines. Spreading it out over three lines doesn't match
what's in the data files. It's valid syntax, but real data is
formatted to at most two lines (See rewrite_dat.pl. Hmm, maybe I
should make that more explicit elsewhere in the README)

> I also have not spent much time yet looking at the end-product .h and .dat
> files.  I did note a bit of distressing inconsistency in the formatting of
> the catalog struct declarations, some of which predates this patch but it
> seems like you've introduced more.  I think what we ought to standardize
> on is a format similar to this in pg_opclass.h:
>
> CATALOG(pg_opclass,2616)
> {
> /* index access method opclass is for */
> Oid opcmethod   BKI_LOOKUP(pg_am);
>

Done, with blank lines interspersed. I put most changes of this sort
in with the other cleanups in patch 0004. I neglected to do this
separately for couple of tiny tables that have lookups, but no default
values. I don't think it impacts the readability of patch 0007 much.

On 3/22/18, Tom Lane  wrote:
> I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054').
[snip]
>> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
>> char attidentity BKI_DEFAULT("");
>
> That definitely seems like a hack --- why not BKI_DEFAULT('\0') ?

Done (patch 0006).


Other changes:
-A README note about OID macros (patch 0007).
-A couple minor cosmetic rearrangements and comment/commit message edits.

Open items:
-Test MSVC.
-Arrange for rewrite_dat.pl to run when perltidy does.
-I was a bit cavalier about when to use =/:= in the Makefiles. Not
sure if there's a preferred project style for when the choice doesn't
really matter.
-Maybe document examples of how to do bulk-editing of data files?


-John Naylor


v12-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: WIP: a way forward on bootstrap data

2018-03-25 Thread Tom Lane

John Naylor  writes:
> -I shortened the data example in the README so it would comfortably
> fit on two lines. Spreading it out over three lines doesn't match
> what's in the data files. It's valid syntax, but real data is
> formatted to at most two lines (See rewrite_dat.pl. Hmm, maybe I
> should make that more explicit elsewhere in the README)

Well, as I said, I hadn't really reviewed the .dat files, but if that's
what you're doing I'm going to request a change.  Project style is to
fit in 80 columns as much as possible.  I do not see a reason to exempt
the .dat files from that, especially not since it would presumably be a
trivial change in rewrite_dat.pl to insert extra newlines between fields
when needed.  (Obviously, if a field value is so wide it runs past 80
columns on its own, it's not rewrite_dat.pl's charter to fix that.)

> Open items:
> -Test MSVC.

Again, while I'd be happy if someone did that manually, I'm prepared
to let the buildfarm do it.

> -Arrange for rewrite_dat.pl to run when perltidy does.

What I was thinking we should have is a convenience target in
include/Makefile to do this, say "make reformat-dat-files".
I'm not that excited about bundling it into pgindent runs.

> -Maybe document examples of how to do bulk-editing of data files?

+1.  In the end, that's the reason we're doing all this work, so showing
people how to benefit seems like a good thing.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-26 Thread John Naylor

On 3/25/18, Tom Lane  wrote:
> Well, as I said, I hadn't really reviewed the .dat files, but if that's
> what you're doing I'm going to request a change.  Project style is to
> fit in 80 columns as much as possible.  I do not see a reason to exempt
> the .dat files from that, especially not since it would presumably be a
> trivial change in rewrite_dat.pl to insert extra newlines between fields
> when needed.  (Obviously, if a field value is so wide it runs past 80
> columns on its own, it's not rewrite_dat.pl's charter to fix that.)

This feature is working now. I've attached a 100-line sample of all
the catalogs' files for viewing. Note, this is pretty raw output,
without the clean-up step from patch 0002. In the most of the original
DATA() lines, there was no spacing between entries except in some
cases to separate groups (often with a comment to describe the group).
My clean-up patch tried to make that more consistent. For this sample,
it would add blank lines before the comments in pg_amop, and remove
blank lines from the first few entries in pg_type. If you wanted to
opine on that before I rework that patch, I'd be grateful.

Also, these data entries have default values removed, but they don't
have human-readable OID macros. (I'll have to adjust that script to
the 80-column limit as well).

>> -Arrange for rewrite_dat.pl to run when perltidy does.
>
> What I was thinking we should have is a convenience target in
> include/Makefile to do this, say "make reformat-dat-files".
> I'm not that excited about bundling it into pgindent runs.

I've attached a draft patch for this. If it's okay, I'll incorporate
it into the series. I think reformat_dat_files.pl also works as a
better script name.

>> -Maybe document examples of how to do bulk-editing of data files?
>
> +1.  In the end, that's the reason we're doing all this work, so showing
> people how to benefit seems like a good thing.

It seems like with that, it'd be good to split off the data-format
section of the README into a new file, maybe README.data, which will
contain code snippets and some example scenarios. I'll include the
example pg_proc.prokind merger among those.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-03-26 Thread John Naylor

With the attachments this time.

-John Naylor
#--
#
# pg_aggregate.dat
#Initial contents of the pg_aggregate system relation.
#
# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
# Portions Copyright (c) 1994, Regents of the University of California
#
# src/include/catalog/pg_aggregate.dat
#
#--

[

# avg
{ aggfnoid => '2100', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => '2281',
  aggtransspace => '48', aggmtranstype => '2281', aggmtransspace => '48' },
{ aggfnoid => '2101', aggtransfn => 'int4_avg_accum', aggfinalfn => 'int8_avg',
  aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int4_avg_accum',
  aggminvtransfn => 'int4_avg_accum_inv', aggmfinalfn => 'int8_avg',
  aggtranstype => '1016', aggmtranstype => '1016', agginitval => '{0,0}',
  aggminitval => '{0,0}' },
{ aggfnoid => '2102', aggtransfn => 'int2_avg_accum', aggfinalfn => 'int8_avg',
  aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int2_avg_accum',
  aggminvtransfn => 'int2_avg_accum_inv', aggmfinalfn => 'int8_avg',
  aggtranstype => '1016', aggmtranstype => '1016', agginitval => '{0,0}',
  aggminitval => '{0,0}' },
{ aggfnoid => '2103', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => '2281', aggtransspace => '128',
  aggmtranstype => '2281', aggmtransspace => '128' },
{ aggfnoid => '2104', aggtransfn => 'float4_accum', aggfinalfn => 'float8_avg',
  aggcombinefn => 'float8_combine', aggtranstype => '1022',
  agginitval => '{0,0,0}' },
{ aggfnoid => '2105', aggtransfn => 'float8_accum', aggfinalfn => 'float8_avg',
  aggcombinefn => 'float8_combine', aggtranstype => '1022',
  agginitval => '{0,0,0}' },
{ aggfnoid => '2106', aggtransfn => 'interval_accum',
  aggfinalfn => 'interval_avg', aggcombinefn => 'interval_combine',
  aggmtransfn => 'interval_accum', aggminvtransfn => 'interval_accum_inv',
  aggmfinalfn => 'interval_avg', aggtranstype => '1187',
  aggmtranstype => '1187', agginitval => '{0 second,0 second}',
  aggminitval => '{0 second,0 second}' },

# sum
{ aggfnoid => '2107', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_sum', aggtranstype => '2281',
  aggtransspace => '48', aggmtranstype => '2281', aggmtransspace => '48' },
{ aggfnoid => '2108', aggtransfn => 'int4_sum', aggcombinefn => 'int8pl',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int2int4_sum', aggtranstype => '20', aggmtranstype => '1016',
  aggminitval => '{0,0}' },
{ aggfnoid => '2109', aggtransfn => 'int2_sum', aggcombinefn => 'int8pl',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int2int4_sum', aggtranstype => '20', aggmtranstype => '1016',
  aggminitval => '{0,0}' },
{ aggfnoid => '2110', aggtransfn => 'float4pl', aggcombinefn => 'float4pl',
  aggtranstype => '700' },
{ aggfnoid => '2111', aggtransfn => 'float8pl', aggcombinefn => 'float8pl',
  aggtranstype => '701' },
{ aggfnoid => '2112', aggtransfn => 'cash_pl', aggcombinefn => 'cash_pl',
  aggmtransfn => 'cash_pl', aggminvtransfn => 'cash_mi', aggtranstype => '790',
  aggmtranstype => '790' },
{ aggfnoid => '2113', aggtransfn => 'interval_pl',
  aggcombinefn => 'interval_pl', aggmtransfn => 'interval_pl',
  aggminvtransfn => 'interval_mi', aggtranstype => '1186',
  aggmtranstype => '1186' },
{ aggfnoid => '2114', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_sum', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_sum', aggtranstype => '2281', aggtransspace => '128',
  aggmtranstype => '2281', aggmtransspace => '128' },

# max
{ aggfnoid => '2115', aggtransfn => 'int8larger', aggcombinefn => 'int8larger',
  aggsortop => '413', aggtranstype => '20' },
{ aggfnoid => '2116', aggtransfn => 'int4larger', aggcombinefn => 'int4larger',
  aggsortop => '521', aggtranstype => '23' },
{ aggfnoid => '2117', aggtransfn => 'int2larger', aggcombinefn => 'int2larger',
  aggsortop => '520', aggtranstype => '21' },
{ agg

Re: WIP: a way forward on bootstrap data

2018-03-26 Thread Tom Lane

John Naylor  writes:
> With the attachments this time.

Layout of .dat files seems generally reasonable, but I don't understand
the proposed make rule:

+reformat-dat-files:
+   $(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog 
catalog/pg_*.dat

This rule has no prerequisite, so what's $< supposed to be?  Also, I think
the rule probably ought to be located in src/include/catalog/Makefile,
because that's typically where you'd be cd'd to when messing with the
.dat files, I'd think.  (Hm, I see no such makefile, but maybe it's time
for one.  A convenience rule located one level up doesn't seem very
convenient.)

> My clean-up patch tried to make that more consistent. For this sample,
> it would add blank lines before the comments in pg_amop, and remove
> blank lines from the first few entries in pg_type. If you wanted to
> opine on that before I rework that patch, I'd be grateful.

No particular objection to either.

>>> -Maybe document examples of how to do bulk-editing of data files?

>> +1.  In the end, that's the reason we're doing all this work, so showing
>> people how to benefit seems like a good thing.

> It seems like with that, it'd be good to split off the data-format
> section of the README into a new file, maybe README.data, which will
> contain code snippets and some example scenarios. I'll include the
> example pg_proc.prokind merger among those.

It would be more work, but maybe we should move this into the main
SGML docs.  It seems rather silly to have SGML documentation for the
.BKI file format, which now will be an internal matter that hardly
any developers need worry about, but not for the .DAT file format.
But I understand if that seems a bridge too far for today --- certainly
a README file is way better than nothing.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-03-26 Thread John Naylor



> On Mar 26, 2018, at 10:44 PM, Tom Lane  wrote

> Layout of .dat files seems generally reasonable, but I don't understand
> the proposed make rule:
> 
> +reformat-dat-files:
> +$(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog 
> catalog/pg_*.dat
> 
> This rule has no prerequisite, so what's $< supposed to be?  Also, I think
> the rule probably ought to be located in src/include/catalog/Makefile,
> because that's typically where you'd be cd'd to when messing with the
> .dat files, I'd think.  (Hm, I see no such makefile, but maybe it's time
> for one.  A convenience rule located one level up doesn't seem very
> convenient.)
> 

Oops, copy-pasto. And I’ll see about a new Makefile.

>> It seems like with that, it'd be good to split off the data-format
>> section of the README into a new file, maybe README.data, which will
>> contain code snippets and some example scenarios. I'll include the
>> example pg_proc.prokind merger among those.
> 
> It would be more work, but maybe we should move this into the main
> SGML docs.  It seems rather silly to have SGML documentation for the
> .BKI file format, which now will be an internal matter that hardly
> any developers need worry about, but not for the .DAT file format.
> But I understand if that seems a bridge too far for today --- certainly
> a README file is way better than nothing.

Makes sense on all points. I’m not optimistic about creating a new sgml doc on 
time, but I’ll keep it in mind.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-03-27 Thread John Naylor

Tom Lane wrote:

>> -Maybe document examples of how to do bulk-editing of data files?
>
> +1.  In the end, that's the reason we're doing all this work, so showing
> people how to benefit seems like a good thing.

I'll hold off on posting a new patchset until I add this to the
documentation, but I wanted to report on a couple of other things:

While adjusting to the 80-column limit, I encountered a separation of
concerns violation between Catalog.pm and reformat_dat_files.pl that I
hadn't noticed before. Fixing that made things easier to read, with
fewer lines of code.

Speaking of bulk editing, that would be done via adopting
reformat_dat_files.pl to the task at hand. I did this myself for two
of the conversion helper scripts. However, enough bitrot has now
occurred that to make the relationship murky. Since I had to adopt
them to the 80-column limit as well, I shaved all the irrelevant
differences away, and now they're just a small diff away from the
reformat script. I also added block comments to help developers find
where they need to edit the script. Since reformat_dat_files.pl has
been substantially altered, I'll attach it here, along with the diffs
to the the helper scripts.

I wrote:

> I’ll see about a new Makefile.

I've attached a draft of this. I thought about adding a call to
duplicate_oids here, but this won't run unless you've run configure
first, and if you've done that, you've likely built already, running
duplicate_oids in the process.

I think I'll consolidate all documentation patches into one, at the
end of the series for maximum flexibility. I liked the idea of
spreading the doc changes over the patches, but there is not a huge
amount of time left.

-John Naylor
--- /home/john/pgdev/postgresql/src/include/catalog/reformat_dat_files.pl   
2018-03-27 18:04:54.698464144 +0700
+++ remove_pg_type_oid_symbols.pl   2018-03-27 18:13:42.270611897 +0700
@@ -1,18 +1,12 @@
 #!/usr/bin/perl -w
 #--
 #
-# reformat_dat_files.pl
-#Perl script that reads in a catalog data file and writes out
-#a functionally equivalent file in a standard format.
-#
-#Metadata entries (if any) come first, with normal attributes
-#starting on the following line, in the same order they would be in
-#the actual table.
+# remove_pg_type_oid_symbols.pl
 #
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 #
-# /src/include/catalog/reformat_dat_files.pl
+# /src/include/catalog/remove_pg_type_oid_symbols.pl
 #
 #--

@@ -85,22 +79,6 @@
$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 1);
 }

-
-# At this point, we have read all the data. If you are modifying this
-# script for bulk editing, this is a good place to build lookup tables,
-# if you need to. In the following example, the "next if !ref $row"
-# check below is a hack to filter out non-hash objects. This is because
-# we build the lookup tables from data that we read using the
-# "preserve_formatting" parameter.
-#
-##Index access method lookup.
-#my %amnames;
-#foreach my $row (@{ $catalog_data{pg_am} })
-#{
-#  next if !ref $row;
-#  $amnames{$row->{oid}} = $row->{amname};
-#}
-

 # Write the data.
 foreach my $catname (@catnames)
@@ -131,10 +109,15 @@
my %values = %$data;

-   # At this point we have the full tuple in memory as a 
hash
-   # and can do any operations we want. As written, it only
-   # removes default values, but this script can be 
adopted to
-   # do one-off bulk-editing.
+   # Remove pg_type OID symbols if they can match the rule
+   # we use to generate them.
+   if ($catname eq 'pg_type' and exists 
$values{oid_symbol})
+   {
+   my $symbol = 
form_pg_type_symbol($values{typname});
+   delete $values{oid_symbol}
+ if defined $symbol
+   and $values{oid_symbol} eq $symbol;
+   }

if (!$full_tuples)
@@ -181,6 +164,26 @@
}
 }

+
+# Determine canonical pg_type OID #define symbol from the type name.
+sub form_pg_type_symbol
+{
+   my $typename = shift;
+
+   # Skip for rowtypes of bootstrap tables.
+   return
+

Re: WIP: a way forward on bootstrap data

2018-03-31 Thread John Naylor

Attached is v13, rebased against b0c90c85fc.

Patch 0001:
-The data files are formatted to at most 80 columns wide.
-Rename rewrite_dat.pl to reformat_dat_file.pl.
-Refactor Catalog.pm and reformat_dat_file.pl to have better
separation of concerns.
-Add src/include/catalog/Makefile with convenience targets for
rewriting data files.

Patch 0002:
-Some adjustments to the post-conversion cleanup of data files.

Patch 0005:
-I made a stub version of Solution.pm to simulate testing the MSVC
build. This found one bug, and also allowed me to bring in some of the
more pedantic dependencies I added to utils/Makefile.

Patch 0009:
-New patch that puts all doc changes in one patch, for flexibility.
-Split the parts of catalog/README having to do with data into a new
README.data file. Add recipes for how to edit data, with code
examples.

On 3/26/18, Tom Lane  wrote:
> It would be more work, but maybe we should move this into the main
> SGML docs.  It seems rather silly to have SGML documentation for the
> .BKI file format, which now will be an internal matter that hardly
> any developers need worry about, but not for the .DAT file format.
> But I understand if that seems a bridge too far for today --- certainly
> a README file is way better than nothing.

I had an idea to make it less silly without doing as much work: Get
rid of the SGML docs for the BKI format, and turn them into
bootstrap/README. Thoughts?

And in the department of second thoughts, it occurred to me that the
only reason that the .dat files are in include/catalog is because
that's where the DATA() statements were. Since they are separate now,
one could make the case that they actually belong in backend/catalog.
One trivial advantage here is that there is already an existing
Makefile in which to put convenience targets for formatting. On the
other hand, it kind of makes sense to have the files describing the
schema (.h) and the contents (.dat) in the same directory. I'm
inclined to leave things as they are for that reason.

-John Naylor

v13-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: WIP: a way forward on bootstrap data

2018-03-31 Thread Tom Lane

John Naylor  writes:
> And in the department of second thoughts, it occurred to me that the
> only reason that the .dat files are in include/catalog is because
> that's where the DATA() statements were. Since they are separate now,
> one could make the case that they actually belong in backend/catalog.
> One trivial advantage here is that there is already an existing
> Makefile in which to put convenience targets for formatting. On the
> other hand, it kind of makes sense to have the files describing the
> schema (.h) and the contents (.dat) in the same directory. I'm
> inclined to leave things as they are for that reason.

Yeah.  The fact that, eg, both the .h and .dat files are inputs to
duplicate_oids and unused_oids makes me think it's better to keep
them together.

I'd actually been thinking of something that's about the reverse:
instead of building the derived .h files in backend/catalog and
then symlinking them into include/catalog, it'd be saner to build
them in include/catalog to begin with.  However, that would mean
that the Perl scripts need to produce output in two different
places, so maybe it'd end up more complicated not less so.
In any case, that seems like something to leave for another day.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-04 Thread Tom Lane

I'm starting to look through v13 seriously, and one thing struck
me that could use some general discussion: what is our policy
going to be for choosing the default values for catalog columns?
In particular, I noticed that you have for pg_proc

boolproisstrict BKI_DEFAULT(f);

charprovolatile BKI_DEFAULT(v);

charproparallel BKI_DEFAULT(u);

which do not comport at all with the most common values in those
columns.  As of HEAD, I see

postgres=# select proisstrict, count(*) from pg_proc group by 1;
 proisstrict | count 
-+---
 f   |   312
 t   |  2640
(2 rows)

postgres=# select provolatile, count(*) from pg_proc group by 1;
 provolatile | count 
-+---
 i   |  2080
 s   |   570
 v   |   302
(3 rows)

postgres=# select proparallel, count(*) from pg_proc group by 1;
 proparallel | count 
-+---
 r   |   139
 s   |  2722
 u   |91
(3 rows)

(Since this is from the final initdb state, this overstates the number
of .bki entries for pg_proc a bit, but not by much.)

I think there's no question that the default for proisstrict ought
to be "true" --- not only is that by far the more common choice,
but it's actually the safer choice.  A C function that needs to be
marked strict and isn't will at best do the wrong thing, and quite
likely will crash, if passed a NULL value.

The defaults for provolatile and proparallel maybe require more thought
though.  What you've chosen corresponds to the default assumptions of
CREATE FUNCTION, which are what we need for user-defined functions that
we don't know anything about; but I'm not sure that makes them the best
defaults for built-in functions.  I'm inclined to go with the majority
values here, in part because that will make the outliers stand out when
looking at pg_proc.dat.  I don't think it's great that we'll have 2800+
entries explicitly marked with proparallel 'i' or 's', but the less-than-
100 with proparallel 'u' will be so only implicitly because the rewrite
script will strip out any field entries that match the default.  That's
really the worst of all worlds: it'd be better to have no default
in this column at all, I think, than to behave like that.

In short, I'm tempted to say that when there's a clear majority of
entries that would use a particular default, that's the default we
should use, whether or not it's "surprising" or "unsafe" according
to the semantics.  It's clearly not "surprising" for a C function
to be marked proparallel 's'; the other cases are more so.

I'm not seeing any other BKI_DEFAULT choices that I'm inclined to
question, so maybe it's a mistake to try to derive any general
policy choices from such a small number of cases.  But anyway
I'm inclined to change these cases.

Comments anyone?

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-04 Thread Andres Freund

Hi,

On 2018-04-04 18:29:31 -0400, Tom Lane wrote:
> I'm starting to look through v13 seriously, and one thing struck
> me that could use some general discussion: what is our policy
> going to be for choosing the default values for catalog columns?
>
> [...]
>
> In short, I'm tempted to say that when there's a clear majority of
> entries that would use a particular default, that's the default we
> should use, whether or not it's "surprising" or "unsafe" according
> to the semantics.  It's clearly not "surprising" for a C function
> to be marked proparallel 's'; the other cases are more so.
>
> [...]
>
> I'm not seeing any other BKI_DEFAULT choices that I'm inclined to
> question, so maybe it's a mistake to try to derive any general
> policy choices from such a small number of cases.  But anyway
> I'm inclined to change these cases.
> 
> Comments anyone?

I think choosing SQL defaults is defensible, but so is choosing the most
common value as default to make uncommon stand out more, and so is
choosing the safest values. In short, I don't think it matters terribly
much, we just should try to be reasonably consistent about.

Greetings,

Andres Freund

Re: WIP: a way forward on bootstrap data

2018-04-04 Thread Tom Lane

Here are the results of an evening's desultory hacking on v13.

I was dissatisfied with the fact that we still had several
function-referencing columns that had numeric instead of symbolic
contents, for instance pg_aggregate.aggfnoid.  Of course, the main reason
is that those are declared regproc but reference functions with overloaded
names, which regproc can't handle.  Now that the lookups are being done in
genbki.pl there's no reason why we have to live with that limitation.
In the attached, I've generalized the BKI_LOOKUP(pg_proc) code so that
you can use either regproc-like or regprocedure-like notation, and then
applied that to relevant columns.

I did not like the hard-wired handling of proargtypes and proallargtypes
in genbki.pl; it hardly seems impossible that we'll want similar
conversions for other array-of-OID columns in future.  After a bit of
thought, it seemed like we could allow

oidvectorproargtypes BKI_LOOKUP(pg_type);

Oid  proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);

and just teach genbki.pl that if a lookup rule is attached to
an oidvector or Oid[] column, it means to apply the rule to
each array element individually.

I also changed genbki.pl so that it'd warn about entries that aren't
recognized by the lookup rules.  This seems like a good idea for
catching errors, such as (ahem) applying BKI_LOOKUP to a column
that isn't even an OID.

bootstrap-v13-delta.patch is a diff atop your patch series for the
in-tree files, and convert_oid2name.patch adjusts that script to
make use of the additional conversion capability.

regards, tom lane

diff --git a/src/backend/catalog/README.data b/src/backend/catalog/README.data
index b7c680c..22ad0f2 100644
*** a/src/backend/catalog/README.data
--- b/src/backend/catalog/README.data
*** teach Catalog::ParseData() how to expand
*** 62,71 
  representation.
  
  - To aid readability, some values that are references to other catalog
! entries are represented by macros rather than numeric OIDs. This is
! the case for index access methods, opclasses, operators, opfamilies,
! and types. This is done for functions as well, but only if the proname
! is unique.
  
  Bootstrap Data Conventions
  ==
--- 62,103 
  representation.
  
  - To aid readability, some values that are references to other catalog
! entries are represented by names rather than numeric OIDs.  Currently
! this is the case for access methods, functions, operators, opclasses,
! opfamilies, and types.  The rules are as follows:
! 
! * Use of names rather than numbers is enabled for a particular catalog
! column by attaching BKI_LOOKUP(lookuprule) to the column's definition,
! where "lookuprule" is pg_am, pg_proc, pg_operator, pg_opclass,
! pg_opfamily, or pg_type.
! 
! * In a name-lookup column, all entries must use the name format except
! when writing "0" for InvalidOid.  (If the column is declared regproc,
! you can optionally write "-" instead of "0".)  genbki.pl will warn
! about unrecognized names.
! 
! * Access methods are just represented by their names, as are types.
! Type names must match the referenced pg_type entry's typname; you
! do not get to use any aliases such as "integer" for "int4".
! 
! * A function can be represented by its proname, if that is unique among
! the pg_proc.dat entries (this works like regproc input).  Otherwise,
! write it as "proname(argtypename,argtypename,...)", like regprocedure.
! The argument type names must be spelled exactly as they are in the
! pg_proc.dat entry's proargtypes field.  Do not insert any spaces.
! 
! * Operators are represented by "oprname(lefttype,righttype)", writing the
! type names exactly as they appear in the pg_operator.dat entry's oprleft
! and oprright fields.  (Write 0 for the omitted operand of a unary
! operator.)
! 
! * The names of opclasses and opfamilies are only unique within an access
! method, so they are represented by "access_method_name/object_name".
! 
! In none of these cases is there any provision for schema-qualification;
! all objects created during bootstrap are expected to be in the pg_catalog
! schema.
! 
  
  Bootstrap Data Conventions
  ==
*** You can also use the duplicate_oids scri
*** 105,111 
  build if a duplicate is found.)
  
  - The OID counter starts at 1 at bootstrap.  If a catalog row is
! in a table that requires OIDs, but no OID was preassigned by an "OID ="
  clause, then it will receive an OID of 1 or above.
  
  
--- 137,143 
  build if a duplicate is found.)
  
  - The OID counter starts at 1 at bootstrap.  If a catalog row is
! in a table that requires OIDs, but no OID was preassigned by an "oid =>"
  clause, then it will receive an OID of 1 or above.
  
  
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 27494d9..f6be50a 100644
*** a/src/backend/catalog/genbki.pl
--- b/src/backend/catalog/genbk

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor

On 4/5/18, Tom Lane  wrote:
> Here are the results of an evening's desultory hacking on v13.
>
> [numeric function oids with overloaded name]

Thank you for the detailed review and for improving the function
references (not to mention the type references I somehow left on the
table). I was also not quite satisfied with just the regproc columns.

> I did not like the hard-wired handling of proargtypes and proallargtypes
> in genbki.pl; it hardly seems impossible that we'll want similar
> conversions for other array-of-OID columns in future.  After a bit of
> thought, it seemed like we could allow
>
> oidvectorproargtypes BKI_LOOKUP(pg_type);
>
> Oid  proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);
>
> and just teach genbki.pl that if a lookup rule is attached to
> an oidvector or Oid[] column, it means to apply the rule to
> each array element individually.

I think that's a good idea. I went an extra step and extracted the
common logic into a function (attached draft patch to be applied on
top of yours). It treats all lookups as operating on arrays. The
common case is that we pass a single-element array. That may seem
awkward, but I think it's clear. The code is slimmer, and the lines
now fit within 80 characters.

> I also changed genbki.pl so that it'd warn about entries that aren't
> recognized by the lookup rules.  This seems like a good idea for
> catching errors, such as (ahem) applying BKI_LOOKUP to a column
> that isn't even an OID.

Yikes, I must have fat-fingered that during the comment reformatting.

Unrelated, I noticed my quoting of defaults that contain back-slashes
was half-baked, so I'll include that fix in the next patchset. I'll
put out a new one in a couple days, to give a chance for further
review and discussion of the defaults. I didn't feel the need to
respond to the other messages, but yours and Andres' points are well
taken.

-John Naylor
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index f6be50a..8d47109 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -343,34 +343,21 @@ EOM
 			#
 			# If the column type is oidvector or oid[], we have to replace
 			# each element of the array as per the lookup rule.
-			#
-			# If we don't have a unique value to substitute, warn and
-			# leave the entry unchanged.
 			if ($column->{lookup})
 			{
 my $lookup = $lookup_kind{ $column->{lookup} };
+my @lookupnames;
+my @lookupoids;
 
 die "unrecognized BKI_LOOKUP type " . $column->{lookup}
   if !defined($lookup);
 
 if ($atttype eq 'oidvector')
 {
-	my @lookupnames = split /\s+/, $bki_values{$attname};
-	my @lookupoids;
-	foreach my $lookupname (@lookupnames)
-	{
-		my $lookupoid = $lookup->{ $lookupname };
-		if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
-		{
-			$lookupname = $lookupoid;
-		}
-		else
-		{
-			warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values))
-if $lookupname ne '-' && $lookupname ne '0';
-		}
-		push @lookupoids, $lookupname;
-	}
+	@lookupnames = split /\s+/, $bki_values{$attname};
+	@lookupoids = lookup_oids($lookup, $catname,
+\%bki_values, @lookupnames);
+
 	$bki_values{$attname} = join(' ', @lookupoids);
 }
 elsif ($atttype eq 'oid[]')
@@ -378,39 +365,21 @@ EOM
 	if ($bki_values{$attname} ne '_null_')
 	{
 		$bki_values{$attname} =~ s/[{}]//g;
-		my @lookupnames = split /,/, $bki_values{$attname};
-		my @lookupoids;
-		foreach my $lookupname (@lookupnames)
-		{
-			my $lookupoid = $lookup->{ $lookupname };
-			if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
-			{
-$lookupname = $lookupoid;
-			}
-			else
-			{
-warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values))
-	if $lookupname ne '-' && $lookupname ne '0';
-			}
-			push @lookupoids, $lookupname;
-		}
+		@lookupnames = split /,/, $bki_values{$attname};
+		@lookupoids = lookup_oids($lookup, $catname,
+	\%bki_values, @lookupnames);
+
 		$bki_values{$attname} =
 			sprintf "{%s}", join(',', @lookupoids);
 	}
 }
 else
 {
-	my $lookupname = $bki_values{$attname};
-	my $lookupoid = $lookup->{ $lookupname };
-	if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
-	{
-		$bki_values{$attname} = $lookupoid;
-	}
-	else
-	{
-		warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values))
-			if $lookupname ne '-' && $lookupname ne '0';
-	}
+	$lookupnames[0] = $bki_values{$attname};
+	@lookupoids = lookup_oids($lookup, $catname,
+\%bki_values, @lookupnames);
+
+	$bki_values{$attname} = $lookupoids[0];
 }
 			}
 		}
@@ -759,6 +728,32 @@ sub morph_row_for_schemapg
 	}
 }
 
+# Perfo

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane

John Naylor  writes:
> On 4/5/18, Tom Lane  wrote:
>> I did not like the hard-wired handling of proargtypes and proallargtypes
>> in genbki.pl; it hardly seems impossible that we'll want similar
>> conversions for other array-of-OID columns in future.  After a bit of
>> thought, it seemed like we could allow
>>  oidvectorproargtypes BKI_LOOKUP(pg_type);
>>  Oid  proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);
>> and just teach genbki.pl that if a lookup rule is attached to
>> an oidvector or Oid[] column, it means to apply the rule to
>> each array element individually.

> I think that's a good idea. I went an extra step and extracted the
> common logic into a function (attached draft patch to be applied on
> top of yours). It treats all lookups as operating on arrays. The
> common case is that we pass a single-element array. That may seem
> awkward, but I think it's clear. The code is slimmer, and the lines
> now fit within 80 characters.

Sounds good.  I too was bothered by the duplication of code, but
I'm not a good enough Perl programmer to have thought of that solution.

Something that bothered me a bit while writing the warning-producing code
is that showing %bki_values isn't actually that great a way of identifying
the trouble spot.  By this point we've expanded out defaults and possibly
replaced some other macros, so it doesn't look that much like what was
in the .dat file.  I think what would be ideal, both here and in some
other places like AddDefaultValues, is to be able to finger the location
of the bad tuple by filename and line number, but I have no idea whether
it's practical to annotate the tuples with that while reading the .dat
files.  Any thoughts?

(Obviously, better error messages could be a future improvement; it's not
something we have to get done before the conversion.)

> Unrelated, I noticed my quoting of defaults that contain back-slashes
> was half-baked, so I'll include that fix in the next patchset. I'll
> put out a new one in a couple days, to give a chance for further
> review and discussion of the defaults. I didn't feel the need to
> respond to the other messages, but yours and Andres' points are well
> taken.

We're getting down to the wire here --- I think the plan is to close
the CF on Saturday or Sunday, and then push the bootstrap changes right
after that.  So please turn around whatever you're planning to do ASAP.
I'm buckling down to a final review today and tomorrow.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane

I experimented with converting all frontend code to include just the
catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
proposed new policy.  I soon found that we'd overlooked one thing:
some clients expect to see the relation OID macros, eg
LargeObjectRelationId.  Attached is a patch that changes things around
so that those appear in the _d files instead of the master files.
This is cleaner anyway because it removes duplication of the OIDs in
the master files, with attendant risk of error.  For example we
have this change in pg_aggregate.h:

-#define AggregateRelationId  2600
-
-CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
+CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS

Some of the CATALOG lines spill well past 80 characters with this,
although many of the affected ones already were overlength, eg

-#define DatabaseRelationId 1262
-#define DatabaseRelation_Rowtype_Id  1248
-
-CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) 
BKI_SCHEMA_MACRO
+CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION 
BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

I thought about improving that by removing the restriction that these
BKI annotations appear on the same line as the CATALOG macro, so that
we could break the above into several lines.  I think the original key
reason for the restriction was to avoid accidentally taking some bit
of a DATA line as a BKI annotation.  With the DATA lines gone from these
files, that's no longer a significant hazard (although passing references
to BKI keywords in comments might still be hazards for the Perl scripts).
However, if we try to format things like

CATALOG(pg_database,1262,DatabaseRelationId)
BKI_SHARED_RELATION
BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id)
BKI_SCHEMA_MACRO
{
fields...
}

I'm afraid that neither pgindent nor a lot of common editors would indent
that very nicely.  So at least for the moment I'm inclined to just keep
it all on one line ... we know how that behaves, anyway.

regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 519247e..fb3d62a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -104,18 +104,29 @@ sub ParseHeader
 			{
 push @{ $catalog{indexing} }, "build indices\n";
 			}
-			elsif (/^CATALOG\(([^,]*),(\d+)\)/)
+			elsif (/^CATALOG\(([^,]*),(\d+),(\w+)\)/)
 			{
 $catalog{catname} = $1;
 $catalog{relation_oid} = $2;
+$catalog{relation_oid_macro} = $3;
 
 $catalog{bootstrap} = /BKI_BOOTSTRAP/ ? ' bootstrap' : '';
 $catalog{shared_relation} =
   /BKI_SHARED_RELATION/ ? ' shared_relation' : '';
 $catalog{without_oids} =
   /BKI_WITHOUT_OIDS/ ? ' without_oids' : '';
-$catalog{rowtype_oid} =
-  /BKI_ROWTYPE_OID\((\d+)\)/ ? " rowtype_oid $1" : '';
+if (/BKI_ROWTYPE_OID\((\d+),(\w+)\)/)
+{
+	$catalog{rowtype_oid} = $1;
+	$catalog{rowtype_oid_clause} = " rowtype_oid $1";
+	$catalog{rowtype_oid_macro} = $2;
+}
+else
+{
+	$catalog{rowtype_oid} = '';
+	$catalog{rowtype_oid_clause} = '';
+	$catalog{rowtype_oid_macro} = '';
+}
 $catalog{schema_macro} = /BKI_SCHEMA_MACRO/ ? 1 : 0;
 $declaring_attributes = 1;
 			}
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index f6be50a..fe8c3ca 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -228,6 +228,7 @@ my @tables_needing_macros;
 # produce output, one catalog at a time
 foreach my $catname (@catnames)
 {
+	my $catalog = $catalogs{$catname};
 
 	# Create one definition header with macro definitions for each catalog.
 	my $def_file = $output_path . $catname . '_d.h';
@@ -258,13 +259,21 @@ foreach my $catname (@catnames)
 
 EOM
 
+	# Emit OID macros for catalog's OID and rowtype OID, if wanted
+	print $def
+	  sprintf("#define %s %s\n", $catalog->{relation_oid_macro}, $catalog->{relation_oid})
+	  if $catalog->{relation_oid_macro} ne '';
+	print $def
+	  sprintf("#define %s %s\n", $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid})
+	  if $catalog->{rowtype_oid_macro} ne '';
+	print $def "\n";
+
 	# .bki CREATE command for this catalog
-	my $catalog = $catalogs{$catname};
 	print $bki "create $catname $catalog->{relation_oid}"
 	  . $catalog->{shared_relation}
 	  . $catalog->{bootstrap}
 	  . $catalog->{without_oids}
-	  . $catalog->{rowtype_oid} . "\n";
+	  . $catalog->{rowtype_oid_clause} . "\n";
 
 	my $first = 1;
 
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
index 9732f61..0e6285f 100755
--- a/src/include/catalog/duplicate_oids
+++ b/src/include/catalog/duplicate_oids
@@ -15,7 +15,7 @@ while (<>)
 	next if /^CATALOG\(.*BKI_BOOTSTRAP/;
 	next
 	  unless /\boid *=> *'(\d+)'/
-		  || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+)\)/
+		  || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+),/

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane

BTW, I experimented with adding blank lines between the hash items in the
.dat files, and that seemed to make a nice improvement in readability,
converting masses of rather gray text into visibly distinct stanzas.
I'm not dead set on that, but try it and see what you think.

A small example from pg_aggregate.dat:

{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
{ aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },
{ aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },
{ aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => 'internal',
  aggtransspace => '128', aggmtranstype => 'internal',
  aggmtransspace => '128' },
{ aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum',
  aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine',
  aggtranstype => '_float8', agginitval => '{0,0,0}' },

versus

{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },

{ aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },

{ aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },

{ aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => 'internal',
  aggtransspace => '128', aggmtranstype => 'internal',
  aggmtransspace => '128' },

{ aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum',
  aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine',
  aggtranstype => '_float8', agginitval => '{0,0,0}' },

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor

On 4/6/18, Tom Lane  wrote:
> I experimented with converting all frontend code to include just the
> catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
> proposed new policy.  I soon found that we'd overlooked one thing:
> some clients expect to see the relation OID macros, eg
> LargeObjectRelationId.  Attached is a patch that changes things around
> so that those appear in the _d files instead of the master files.
> This is cleaner anyway because it removes duplication of the OIDs in
> the master files, with attendant risk of error.  For example we
> have this change in pg_aggregate.h:
>
> -#define AggregateRelationId  2600
> -
> -CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
> +CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS
>
> Some of the CATALOG lines spill well past 80 characters with this,
> although many of the affected ones already were overlength, eg
>
> -#define DatabaseRelationId   1262
> -#define DatabaseRelation_Rowtype_Id  1248
> -
> -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248)
> BKI_SCHEMA_MACRO
> +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION
> BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

It seems most of the time the FooRelationId labels are predictable,
although not as pristine as the Anum_* constants. One possibility that
came to mind is to treat these like pg_type OID #defines -- have a
simple rule that can be overridden for historical reasons. In this
case the pg_database change would simply be:

-#define DatabaseRelationId 1262
-#define DatabaseRelation_Rowtype_Id  1248
-

and genbki.pl would know what to do. But for pg_am:

-#define AccessMethodRelationId 2601
-
-CATALOG(pg_am,2601)
+CATALOG(pg_am,2601) BKI_REL_LABEL(AccessMethod)

I haven't thought this through yet. I imagine it will add as well as
remove a bit of complexity, code-wise. The upside is most CATALOG
lines will remain unchanged, and those that do won't end up quite as
long. I can try a draft tomorrow to see how it looks, unless you see
an obvious downside.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor

On 4/5/18, Tom Lane  wrote:
> I've generalized the BKI_LOOKUP(pg_proc) code so that
> you can use either regproc-like or regprocedure-like notation, and then
> applied that to relevant columns.
> [...]
> bootstrap-v13-delta.patch is a diff atop your patch series for the
> in-tree files, and convert_oid2name.patch adjusts that script to
> make use of the additional conversion capability.

Looking at convert_oid2name.patch again, I see this:

+   elsif ($catname eq 'pg_am')
+   {
+   $values{aggfnoid} = 
lookup_procname($values{aggfnoid});
+   }

aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup.
Do you remember the intent here?

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane

John Naylor  writes:
> It seems most of the time the FooRelationId labels are predictable,
> although not as pristine as the Anum_* constants. One possibility that
> came to mind is to treat these like pg_type OID #defines -- have a
> simple rule that can be overridden for historical reasons.

Meh.  I'd just as soon avoid having some catalogs done one way and
some another.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane

John Naylor  writes:
> Looking at convert_oid2name.patch again, I see this:

> + elsif ($catname eq 'pg_am')
> + {
> + $values{aggfnoid} = 
> lookup_procname($values{aggfnoid});
> + }

> aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup.
> Do you remember the intent here?

Ugh, copy-and-pasteo.  I intended to have it lookup pg_am.amhandler, but
must have missed changing the field name after copying code from the
pg_aggregate stanza.  Seems to have been unnecessary anyway, since all the
entries in the column are already symbolic.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane

Just had another thought about this business: if practical, we should
remove the distinction between "descr" and "shdescr" and just use the
former name in .dat files.  genbki.pl knows which catalogs are shared,
so it ought to be able to figure out where to route the descriptions.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor

On 4/6/18, Tom Lane  wrote:
> Just had another thought about this business: if practical, we should
> remove the distinction between "descr" and "shdescr" and just use the
> former name in .dat files.  genbki.pl knows which catalogs are shared,
> so it ought to be able to figure out where to route the descriptions.

Fairly trivial (attached), and shouldn't be too hard to integrate into
the series.

-John Naylor
diff --git a/src/backend/catalog/README.data b/src/backend/catalog/README.data
index 22ad0f2..2c05fab 100644
--- a/src/backend/catalog/README.data
+++ b/src/backend/catalog/README.data
@@ -22,7 +22,7 @@ modified exerpt from pg_database.dat will demonstrate the key features:
 # that might contain non-word characters, so we must double-quote them.
 
 { oid => '1', oid_symbol => 'TemplateDbOid',
-  shdescr => 'database\'s default template',
+  descr => 'database\'s default template',
   datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING',
   datcollate => '"LC_COLLATE"', datctype => '"LC_CTYPE"', datistemplate => 't',
   datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0',
@@ -33,8 +33,8 @@ modified exerpt from pg_database.dat will demonstrate the key features:
 
 -The overall file layout is: open bracket, one or more sets of curly
 brackets containing comma-separated key-value pairs, close bracket.
--The metadata fields oid, oid_symbol, descr, and shdescr come first on
-their own line(s) within the curly brackets.
+-The metadata fields oid, oid_symbol, and descr come first on their own
+line(s) within the curly brackets.
 -All values are single-quoted.
 -Single quotes within values must be escaped.
 -If a value is a macro to be expanded by initdb.c, it must also have
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 8d47109..bbe7da8 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -400,13 +400,16 @@ EOM
 		# postgres.shdescription
 		if (defined $bki_values{descr})
 		{
-			printf $descr "%s\t%s\t0\t%s\n",
-			  $bki_values{oid}, $catname, $bki_values{descr};
-		}
-		if (defined $bki_values{shdescr})
-		{
-			printf $shdescr "%s\t%s\t%s\n",
-			  $bki_values{oid}, $catname, $bki_values{shdescr};
+			if ($catalog->{shared_relation})
+			{
+printf $shdescr "%s\t%s\t%s\n",
+  $bki_values{oid}, $catname, $bki_values{descr};
+			}
+			else
+			{
+printf $descr "%s\t%s\t0\t%s\n",
+  $bki_values{oid}, $catname, $bki_values{descr};
+			}
 		}
 
 		# Emit OID symbol
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index d5c2038..957ca84 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -16,7 +16,7 @@
 # that might contain non-word characters, so we must double-quote them.
 
 { oid => '1', oid_symbol => 'TemplateDbOid',
-  shdescr => 'default template for new databases',
+  descr => 'default template for new databases',
   datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING',
   datcollate => '"LC_COLLATE"', datctype => '"LC_CTYPE"', datistemplate => 't',
   datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0',
diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl
index c247a72..f748a45 100644
--- a/src/include/catalog/reformat_dat_file.pl
+++ b/src/include/catalog/reformat_dat_file.pl
@@ -59,7 +59,7 @@ if ($output_path ne '' && substr($output_path, -1) ne '/')
 }
 
 # Metadata of a catalog entry
-my @METADATA = ('oid', 'oid_symbol', 'descr', 'shdescr');
+my @METADATA = ('oid', 'oid_symbol', 'descr');
 
 # Read all the input files into internal data structures.
 # We pass data file names as arguments and then look for matching

Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor

On 4/6/18, Tom Lane  wrote:
> BTW, I experimented with adding blank lines between the hash items in the
> .dat files, and that seemed to make a nice improvement in readability,
> converting masses of rather gray text into visibly distinct stanzas.
> I'm not dead set on that, but try it and see what you think.

Narrow entries with natural whitespace might be okay as is. The
pg_aggregate example is better with blank lines, but another thing to
consider is that a comment that hugs a block is clear on which entries
it's referring to (pg_amop):

# btree integer_ops

# default operators int2
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '1', amopopr => '<(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '2', amopopr => '<=(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '3', amopopr => '=(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '4', amopopr => '>=(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '5', amopopr => '>(int2,int2)',
  amopmethod => 'btree' },

# crosstype operators int24
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
...
[more blocks of integer ops
...]
  amopmethod => 'btree' },

# btree oid_ops

With a blank line beween every entry, the comments would "float" more,
and the scope is not as clear. I'm okay with whatever the community
thinks, but at this point I'm inclined to leave things as they are and
focus on the other points of review for the next patchset.

While on the subject of viewing, I do have a badly outdated patch that
would create a postgres.sql file which would load into a development
schema so one could query the bootstrap data in a database without
running initdb. I could update it at a future point.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread John Naylor

On 4/5/18, Tom Lane  wrote:
> Something that bothered me a bit while writing the warning-producing code
> is that showing %bki_values isn't actually that great a way of identifying
> the trouble spot.  By this point we've expanded out defaults and possibly
> replaced some other macros, so it doesn't look that much like what was
> in the .dat file.  I think what would be ideal, both here and in some
> other places like AddDefaultValues, is to be able to finger the location
> of the bad tuple by filename and line number, but I have no idea whether
> it's practical to annotate the tuples with that while reading the .dat
> files.  Any thoughts?

We could use the $. variable to save the line number, which is what
the old code had. AddDefaultValues will report the line number on
failure, so I left out explicit line number reporting. If memory
serves, Perl is sensitive to how you format the "die" message. If I
delete a default value from the header, I get this, reporting line 16:

Failed to form full tuple for pg_opfamily
Missing values for: opfnamespace
Showing other values for context:
oid => 421, opfmethod => 403, opfowner => PGUID, opfname =>
abstime_ops,  at ../../../src/backend/catalog/Catalog.pm line 259,
<$ifd> line 16.
Makefile:23: recipe for target 'reformat-dat-files' failed
make: *** [reformat-dat-files] Error 25

I think the context is good for pg_attribute, because there is no file
to read from.
I'll think about the lookup code.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread John Naylor

For version 14, diffed against f1464c53804:

-Use majority values for proisstrict, provolatile, proparallel (patch 0006)
-Use valid C string for multi-char defaults containing a backslash (patch 0006)
-Apply Tom's patch for additional lookups, slightly modified by me
(convert_oid2name.pl, patch 0007)
-Apply Tom's patch for relation/rowtype OID macros (patch 0005)

On 4/6/18, Tom Lane  wrote:
> Just had another thought about this business: if practical, we should
> remove the distinction between "descr" and "shdescr" and just use the
> former name in .dat files.  genbki.pl knows which catalogs are shared,
> so it ought to be able to figure out where to route the descriptions.

Done (convert_header2dat.pl, patches 0001, 0003, 0009)

On 4/5/18, Tom Lane  wrote:
> I think what would be ideal, both here and in some
> other places like AddDefaultValues, is to be able to finger the location
> of the bad tuple by filename and line number, but I have no idea whether
> it's practical to annotate the tuples with that while reading the .dat
> files.  Any thoughts?

Done (patch 0007). So far only lookup_oids() uses it.

-John Naylor


v14-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread Tom Lane

John Naylor  writes:
> On 4/6/18, Tom Lane  wrote:
>> BTW, I experimented with adding blank lines between the hash items in the
>> .dat files, and that seemed to make a nice improvement in readability,
>> converting masses of rather gray text into visibly distinct stanzas.
>> I'm not dead set on that, but try it and see what you think.

> Narrow entries with natural whitespace might be okay as is. The
> pg_aggregate example is better with blank lines,

Yeah, it's somewhat of a worst-case example, because (more or less by
chance) most of the entries have last lines that are mostly full.
In other places it often happens that the last line of an entry is
much shorter than average, and that provides enough of a visual break,
as in your example from pg_amop.

> but another thing to
> consider is that a comment that hugs a block is clear on which entries
> it's referring to (pg_amop):

True, we'd need to rethink vertical spacing for comments.  I don't offhand
see why we couldn't keep comments that apply to a specific entry directly
adjacent to that entry, though.

Anyway, as I said, I'm not set on this change.  If you're unexcited by
the idea then let's drop it.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread Tom Lane

I wrote:
> John Naylor  writes:
>> On 4/6/18, Tom Lane  wrote:
>>> BTW, I experimented with adding blank lines between the hash items in the
>>> .dat files, and that seemed to make a nice improvement in readability,

> Anyway, as I said, I'm not set on this change.  If you're unexcited by
> the idea then let's drop it.

On third thought, there's actually a positive reason not to do that.
The more vertical whitespace we have, the less traction there is for
context diffs to find the right place to change, so that we'd be
increasing the risk of patch misapplication.  That was one of the
worries that we set out to avoid with this whole design.

So nevermind ... I'll get on with looking at v14.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread Tom Lane

Attached is a round of minor review fixes.  Much of this is cleaning
up existing mistakes or out-of-date comments, rather than anything
introduced by this patchset, but I noticed it while going through the
patch.

The additional EXPOSE_TO_CLIENT_CODE bits actually are necessary,
in some cases, as I had compile failures without them after changing
client include lines.  (I'll post that separately.)

I added a couple more BKI_DEFAULT markers here, but omitted the ensuing
.dat file updates, because they're just mechanical.

I don't think there's any great need to incorporate this into your patch
set.  As far as I'm concerned, v14 is ready as-is, and I'll just apply
this over the top of it.  (Note that I'll probably smash the whole thing
to one commit when the time comes.)

I have some other work pending on the documentation aspect, but that's
not quite ready for public consumption yet.

regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 8729ccd..1626999 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3566,7 +3566,7 @@ Oid PQftype(const PGresult *res,
You can query the system table pg_type to
obtain the names and properties of the various data types. The
OIDs of the built-in data types are defined
-   in the file src/include/catalog/pg_type.h
+   in the file src/include/catalog/pg_type_d.h
in the source tree.
   
  
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 17213d4..d25d98a 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -25,8 +25,10 @@ BKIFILES = postgres.bki postgres.description postgres.shdescription
 
 include $(top_srcdir)/src/backend/common.mk
 
-# Note: there are some undocumented dependencies on the ordering in which
-# the catalog header files are assembled into postgres.bki.
+# Note: the order of this list determines the order in which the catalog
+# header files are assembled into postgres.bki.  BKI_BOOTSTRAP catalogs
+# must appear first, and there are reputedly other, undocumented ordering
+# dependencies.
 CATALOG_HEADERS := \
 	pg_proc.h pg_type.h pg_attribute.h pg_class.h \
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
@@ -49,7 +51,8 @@ CATALOG_HEADERS := \
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h
 
 # In the list of headers used to assemble postgres.bki, indexing.h needs
-# be last, and toasting.h just before it.
+# be last, and toasting.h just before it.  This ensures we don't try to
+# create indexes or toast tables before their catalogs exist.
 POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
 	$(CATALOG_HEADERS) toasting.h indexing.h \
 	)
diff --git a/src/include/catalog/Makefile b/src/include/catalog/Makefile
index d84a572..1da3ea7 100644
--- a/src/include/catalog/Makefile
+++ b/src/include/catalog/Makefile
@@ -2,9 +2,6 @@
 #
 # Makefile for src/include/catalog
 #
-# 'make reformat-dat-files' is a convenience target for rewriting the
-# catalog data files in a standard format.
-#
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 #
@@ -19,10 +16,16 @@ include $(top_builddir)/src/Makefile.global
 # location of Catalog.pm
 catalogdir = $(top_srcdir)/src/backend/catalog
 
+# 'make reformat-dat-files' is a convenience target for rewriting the
+# catalog data files in our standard format.  This includes collapsing
+# out any entries that are redundant with a BKI_DEFAULT annotation.
 reformat-dat-files:
 	$(PERL) -I $(catalogdir) reformat_dat_file.pl pg_*.dat
 
+# 'make expand-dat-files' is a convenience target for expanding out all
+# default values in the catalog data files.  This should be run before
+# altering or removing any BKI_DEFAULT annotation.
 expand-dat-files:
 	$(PERL) -I $(catalogdir) reformat_dat_file.pl pg_*.dat --full-tuples
 
-.PHONY: reformat-dat-files
+.PHONY: reformat-dat-files expand-dat-files
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
index 0e6285f..8c143cf 100755
--- a/src/include/catalog/duplicate_oids
+++ b/src/include/catalog/duplicate_oids
@@ -30,7 +30,7 @@ foreach my $oid (sort { $a <=> $b } keys %oidcounts)
 {
 	next unless $oidcounts{$oid} > 1;
 	$found = 1;
-	print "***Duplicate OID: $oid\n";
+	print "$oid\n";
 }
 
 exit $found;
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index 02c38c4..b1e2cbd 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -34,7 +34,7 @@
 #define BKI_FORCE_NOT_NULL
 /* Specifies a default value for a catalog field */
 #define BKI_DEFAULT(value)
-/* Indicates how to perform name lookups for OID fields */
+/* Indicates how to perform name lookups for an OID or OID-array field */
 #define BKI_LOOKUP(catalog)
 
 /* The following are never defined; they are here only for documentation. */

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread Tom Lane

Oh, one more thing: looking again at the contents of pg_proc.dat,
I find myself annoyed at the need to specify pronargs.  That's
entirely derivable from proargtypes, and if we did so, we'd get
down to this for the first few pg_proc entries:

{ oid => '1242', descr => 'I/O',
  proname => 'boolin', prorettype => 'bool', proargtypes => 'cstring',
  prosrc => 'boolin' },
{ oid => '1243', descr => 'I/O',
  proname => 'boolout', prorettype => 'cstring', proargtypes => 'bool',
  prosrc => 'boolout' },
{ oid => '1244', descr => 'I/O',
  proname => 'byteain', prorettype => 'bytea', proargtypes => 'cstring',
  prosrc => 'byteain' },

which seems pretty close to the minimum amount of stuff you could expect
to need to write.

I recall that this and some other data compression methods were on the
table awhile back, and I expressed doubt about putting very much magic
knowledge in genbki.pl, but this case seems like a no-brainer.  genbki
can always get the right answer, and expecting people to do it just
creates another way to make a mistake.

So attached is an incremental patch to do that.  I had to adjust
AddDefaultValues's argument list to tell it the catalog name,
and once I did that, I saw that the API arrangement whereby the
caller throws error was just useless complexity, so I got rid of it.

regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 70aea89..3b3bb6b 100644
*** a/src/backend/catalog/Catalog.pm
--- b/src/backend/catalog/Catalog.pm
*** sub ParseData
*** 254,265 
  }
  
  # Expand tuples to their full representation.
! my $error = AddDefaultValues($hash_ref, $schema);
! if ($error)
! {
! 	print "Failed to form full tuple for $catname\n";
! 	die $error;
! }
  			}
  			else
  			{
--- 254,260 
  }
  
  # Expand tuples to their full representation.
! AddDefaultValues($hash_ref, $schema, $catname);
  			}
  			else
  			{
*** sub ParseData
*** 289,302 
  	return $data;
  }
  
! # Fill in default values of a record using the given schema. It's the
! # caller's responsibility to specify other values beforehand.
! # If we fail to fill all fields, return a nonempty error message.
  sub AddDefaultValues
  {
! 	my ($row, $schema) = @_;
  	my @missing_fields;
- 	my $msg;
  
  	foreach my $column (@$schema)
  	{
--- 284,295 
  	return $data;
  }
  
! # Fill in default values of a record using the given schema.
! # It's the caller's responsibility to specify other values beforehand.
  sub AddDefaultValues
  {
! 	my ($row, $schema, $catname) = @_;
  	my @missing_fields;
  
  	foreach my $column (@$schema)
  	{
*** sub AddDefaultValues
*** 311,316 
--- 304,316 
  		{
  			$row->{$attname} = $column->{default};
  		}
+ 		elsif ($catname eq 'pg_proc' && $attname eq 'pronargs' &&
+ 			   defined($row->{proargtypes}))
+ 		{
+ 			# pg_proc.pronargs can be derived from proargtypes.
+ 			my @proargtypes = split /\s+/, $row->{proargtypes};
+ 			$row->{$attname} = scalar(@proargtypes);
+ 		}
  		else
  		{
  			# Failed to find a value.
*** sub AddDefaultValues
*** 320,333 
  
  	if (@missing_fields)
  	{
! 		$msg = "Missing values for: " . join(', ', @missing_fields);
! 		$msg .= "\nShowing other values for context:\n";
  		while (my($key, $value) = each %$row)
  		{
  			$msg .= "$key => $value, ";
  		}
  	}
- 	return $msg;
  }
  
  # Rename temporary files to final names.
--- 320,334 
  
  	if (@missing_fields)
  	{
! 		my $msg = "Failed to form full tuple for $catname\n";
! 		$msg .= "Missing values for: " . join(', ', @missing_fields);
! 		$msg .= "\nOther values for row:\n";
  		while (my($key, $value) = each %$row)
  		{
  			$msg .= "$key => $value, ";
  		}
+ 		die $msg;
  	}
  }
  
  # Rename temporary files to final names.
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index d4c2ddf..3512952 100644
*** a/src/backend/catalog/genbki.pl
--- b/src/backend/catalog/genbki.pl
*** sub morph_row_for_pgattr
*** 641,651 
  		$row->{attnotnull} = 'f';
  	}
  
! 	my $error = Catalog::AddDefaultValues($row, $pgattr_schema);
! 	if ($error)
! 	{
! 		die "Failed to form full tuple for pg_attribute: ", $error;
! 	}
  }
  
  # Write an entry to postgres.bki. Adding quotes here allows us to keep
--- 641,647 
  		$row->{attnotnull} = 'f';
  	}
  
! 	Catalog::AddDefaultValues($row, $pgattr_schema, 'pg_attribute');
  }
  
  # Write an entry to postgres.bki. Adding quotes here allows us to keep
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 2c42b5c..b25546e 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*** CATALOG(pg_proc,1255,ProcedureRelationId
*** 73,78 
--- 73,79 
  	char		proparallel BKI_DEFAULT(s);
  
  	/* number of arguments */
+ 	/* Note: need not be given in pg_proc.dat; genbki.pl will compute it */
  	in

Re: WIP: a way forward on bootstrap data

2018-04-06 Thread John Naylor

On 4/6/18, Tom Lane  wrote:

> I don't think there's any great need to incorporate this into your patch
> set.  As far as I'm concerned, v14 is ready as-is, and I'll just apply
> this over the top of it.  (Note that I'll probably smash the whole thing
> to one commit when the time comes.)

Glad to hear it. A couple recent commits added #define symbols to
headers, which broke the patchset, so I've attached v15, diff'd
against 4f813c7203e. Commit 9fdb675fc added a symbol to pg_opfamily.h
where there were none before, so I went ahead and wrapped it with an
EXPOSE_TO_CLIENT_CODE macro.

All your additional patches apply still apply over it. Your SGML patch
can only apply if my doc patch is not applied, but I've included it
anyway for the sake of no surprises.

I'll check back in 24 hours to see if everything still applies.

-John Naylor

v15-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: WIP: a way forward on bootstrap data

2018-04-07 Thread Alvaro Herrera

John Naylor wrote:
> On 4/6/18, Tom Lane  wrote:
> 
> > I don't think there's any great need to incorporate this into your patch
> > set.  As far as I'm concerned, v14 is ready as-is, and I'll just apply
> > this over the top of it.  (Note that I'll probably smash the whole thing
> > to one commit when the time comes.)
> 
> Glad to hear it. A couple recent commits added #define symbols to
> headers, which broke the patchset, so I've attached v15, diff'd
> against 4f813c7203e. Commit 9fdb675fc added a symbol to pg_opfamily.h
> where there were none before, so I went ahead and wrapped it with an
> EXPOSE_TO_CLIENT_CODE macro.

Actually, after pushing that, I was thinking maybe it's better to remove
that #define from there and put it in each of the two .c files that need
it.  I don't think it makes sense to expose this macro any further, and
before that commit it was localized to a single file.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-04-07 Thread Tom Lane

Alvaro Herrera  writes:
> John Naylor wrote:
>> Commit 9fdb675fc added a symbol to pg_opfamily.h
>> where there were none before, so I went ahead and wrapped it with an
>> EXPOSE_TO_CLIENT_CODE macro.

> Actually, after pushing that, I was thinking maybe it's better to remove
> that #define from there and put it in each of the two .c files that need
> it.  I don't think it makes sense to expose this macro any further, and
> before that commit it was localized to a single file.

We're speaking of IsBooleanOpfamily, right?  Think I'd leave it where it
is.  As soon as you have more than one place using a macro like that,
there's room for maintenance mistakes.

Now it could also be argued that indxpath.c and partprune.c don't
necessarily have the same idea of "boolean opfamily" anyway, in which case
giving them separate copies might be better.  Not sure about that.

Anyway, now that John and I have each (separately) rebased the bootstrap
patch over that, I'd appreciate it if you hold off cosmetic refactoring
till said patch goes in, which I expect to do in ~ 24 hours.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-07 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > John Naylor wrote:
> >> Commit 9fdb675fc added a symbol to pg_opfamily.h
> >> where there were none before, so I went ahead and wrapped it with an
> >> EXPOSE_TO_CLIENT_CODE macro.
> 
> > Actually, after pushing that, I was thinking maybe it's better to remove
> > that #define from there and put it in each of the two .c files that need
> > it.  I don't think it makes sense to expose this macro any further, and
> > before that commit it was localized to a single file.
> 
> We're speaking of IsBooleanOpfamily, right?

Yeah, that's the one.

> Think I'd leave it where it is.  As soon as you have more than one
> place using a macro like that, there's room for maintenance mistakes.

Yeah, that's why I thought it'd be better to have it somewhere central
(the originally submitted patch just added it to partprune.c).

> Anyway, now that John and I have each (separately) rebased the bootstrap
> patch over that, I'd appreciate it if you hold off cosmetic refactoring
> till said patch goes in, which I expect to do in ~ 24 hours.

Understood.  I'm going over David Rowley's runtime pruning patch now
(doesn't touch any catalog files), which I intend to be my last
functional commit this cycle, and won't be doing any other commits till
after bootstrap rework has landed.  (As I mentioned elsewhere, I intend
to propose some restructuring of partitioning code, without any
functional changes, during next week.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-04-08 Thread John Naylor

I wrote:

> I'll check back in 24 hours to see if everything still applies.

There were a couple more catalog changes that broke patch context, so
attached is version 16.

-John Naylor


v16-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: WIP: a way forward on bootstrap data

2018-04-08 Thread Tom Lane

John Naylor  writes:
> There were a couple more catalog changes that broke patch context, so
> attached is version 16.

Pushed with a few more adjustments (mostly, another round of copy-editing
on bki.sgml).  Now we wait to see what the buildfarm thinks, particularly
about the MSVC build ...

Congratulations, and many thanks, to John for seeing this through!
I know it's been a huge amount of work, but we've needed this for years.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-08 Thread John Naylor

On 4/9/18, Tom Lane  wrote:
> Congratulations, and many thanks, to John for seeing this through!
> I know it's been a huge amount of work, but we've needed this for years.

Many thanks for your review, advice, and additional hacking. Thanks
also to Álvaro for review and initial commits, and to all who
participated in previous discussion.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-04-08 Thread Andres Freund



On April 8, 2018 10:19:59 AM PDT, Tom Lane  wrote:
>John Naylor  writes:
>> There were a couple more catalog changes that broke patch context, so
>> attached is version 16.
>
>Pushed with a few more adjustments (mostly, another round of
>copy-editing
>on bki.sgml).  Now we wait to see what the buildfarm thinks,
>particularly
>about the MSVC build ...
>
>Congratulations, and many thanks, to John for seeing this through!
>I know it's been a huge amount of work, but we've needed this for
>years.

Seconded and thirded and fourthed (?)!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: WIP: a way forward on bootstrap data

2018-04-17 Thread John Naylor

On 4/6/18, Tom Lane  wrote:
> I experimented with converting all frontend code to include just the
> catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
> proposed new policy.  I soon found that we'd overlooked one thing:
> some clients expect to see the relation OID macros, eg
> LargeObjectRelationId.  Attached is a patch that changes things around
> so that those appear in the _d files instead of the master files.
[...]
> Some of the CATALOG lines spill well past 80 characters with this,
> although many of the affected ones already were overlength, eg
>
> -#define DatabaseRelationId   1262
> -#define DatabaseRelation_Rowtype_Id  1248
> -
> -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248)
> BKI_SCHEMA_MACRO
> +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION
> BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

[ to which I responded with an inadequate alternative ]

Thinking about this some more, a way occurred to me to shorten the
CATALOG lines while still treating all headers the same, and with very
little code (Patch 0001). What we do is automate the use of
'RelationId' and 'Relation_Rowtype_Id' so that the CATALOG macro only
needs the part pertaining to the table name, and the BKI_ROWTYPE_OID
macro can go back to just having the OID, eg:

CATALOG(pg_database,1262,Database) BKI_SHARED_RELATION
BKI_ROWTYPE_OID(1248) BKI_SCHEMA_MACRO

This is shorter, but not quite as short as before the conversion. If
we really wanted to, we could also leave off the BKI_ prefix from the
CATALOG options (Patch 0002), eg:

CATALOG(pg_database,1262,Database) SHARED_RELATION ROWTYPE_OID(1248)
SCHEMA_MACRO

CATALOG itself lacks a prefix, and its options can only go in one
place, so we'd lose some consistency but perhaps we don't lose any
clarity. (This isn't true for the attribute-level options for
nullability etc, which can appear all over the place.)

Results below - number of CATALOG lines with more than 80 characters,
and the longest line:

grep -E '^CATALOG\(' src/include/catalog/pg_*.h  | sed 's/.*://' | awk
'{ print length, $0 }' | sort -n -r

--
before conversion: 6 lines > 80 chars
105 CATALOG(pg_auth_members,1261) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
BKI_ROWTYPE_OID(2843) BKI_SCHEMA_MACRO

--
after conversion: 14 lines > 80 chars
162 CATALOG(pg_shseclabel,3592,SharedSecLabelRelationId)
BKI_SHARED_RELATION
BKI_ROWTYPE_OID(4066,SharedSecLabelRelation_Rowtype_Id)
BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO

--
with Patch 0001: 11 lines > 80 chars
118 CATALOG(pg_shseclabel,3592,SharedSecLabel) BKI_SHARED_RELATION
BKI_ROWTYPE_OID(4066) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO

--
with Patch 0002: 5 lines > 80 chars
102 CATALOG(pg_shseclabel,3592,SharedSecLabel) SHARED_RELATION
ROWTYPE_OID(4066) WITHOUT_OIDS SCHEMA_MACRO


-John Naylor
From ff1384643e47ef423ab13cfd847f12c0969029f1 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 16 Apr 2018 17:42:28 +0700
Subject: [PATCH 1/2] Shorten notation for relation and rowtype OID macros

---
 src/backend/catalog/Catalog.pm| 8 +---
 src/include/catalog/duplicate_oids| 2 +-
 src/include/catalog/genbki.h  | 4 ++--
 src/include/catalog/pg_aggregate.h| 2 +-
 src/include/catalog/pg_am.h   | 2 +-
 src/include/catalog/pg_amop.h | 2 +-
 src/include/catalog/pg_amproc.h   | 2 +-
 src/include/catalog/pg_attrdef.h  | 2 +-
 src/include/catalog/pg_attribute.h| 2 +-
 src/include/catalog/pg_auth_members.h | 2 +-
 src/include/catalog/pg_authid.h   | 2 +-
 src/include/catalog/pg_cast.h | 2 +-
 src/include/catalog/pg_class.h| 2 +-
 src/include/catalog/pg_collation.h| 2 +-
 src/include/catalog/pg_constraint.h   | 2 +-
 src/include/catalog/pg_conversion.h   | 2 +-
 src/include/catalog/pg_database.h | 2 +-
 src/include/catalog/pg_db_role_setting.h  | 2 +-
 src/include/catalog/pg_default_acl.h  | 2 +-
 src/include/catalog/pg_depend.h   | 2 +-
 src/include/catalog/pg_description.h  | 2 +-
 src/include/catalog/pg_enum.h | 2 +-
 src/include/catalog/pg_event_trigger.h| 2 +-
 src/include/catalog/pg_extension.h| 2 +-
 src/include/catalog/pg_foreign_data_wrapper.h | 2 +-
 src/include/catalog/pg_foreign_server.h   | 2 +-
 src/include/catalog/pg_foreign_table.h| 2 +-
 src/include/catalog/pg_index.h| 2 +-
 src/include/catalog/pg_inherits.h | 2 +-
 src/include/catalog/pg_init_privs.h   | 2 +-
 src/include/catalog/pg_language.h | 2 +-
 src/include/catalog/pg_largeobject.h  | 2 +-
 src/include/catalog/pg_largeobject_metadata.h | 2 +-
 src/include/catalog/pg_namespace.h| 2 +-
 src/include/catalog/pg_opclass.h  | 2 +-
 src/include/catalog/pg_operator.h | 2 +-
 src/include/catalog

Re: WIP: a way forward on bootstrap data

2018-04-17 Thread Tom Lane

John Naylor  writes:
> On 4/6/18, Tom Lane  wrote:
>> Some of the CATALOG lines spill well past 80 characters with this,
>> although many of the affected ones already were overlength, eg ...

> Thinking about this some more, a way occurred to me to shorten the
> CATALOG lines while still treating all headers the same, and with very
> little code (Patch 0001). What we do is automate the use of
> 'RelationId' and 'Relation_Rowtype_Id' so that the CATALOG macro only
> needs the part pertaining to the table name, and the BKI_ROWTYPE_OID
> macro can go back to just having the OID, eg:

Hm ... I don't like this too much, because it means that grepping for
those macros will no longer turn up the source of their definition.
Yeah, if you already know how Relation_Rowtype_Id macros are created,
you might not be confused, but I think it'd be problematic for
newcomers.  Essentially we'd be shortening these lines by obfuscating,
which doesn't seem like a good tradeoff.

It might be all right to drop the BKI_ prefixes as per your other
suggestion, but I'm worried about possible symbol conflicts.  It's
probably not really worth changing that by itself.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-25 Thread Robert Haas

On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger  wrote:
> There still seems to be a lot of boilerplate in the .dat files
> that could be eliminated.  Tom mentioned upthread that he did
> not want too much magic in genbki.pl or Catalog.pm, but I think
> I can propose putting some magic in the header files themselves.
>
> Take, for example, some of the fields in pg_type.dat.  I'll elide
> the ones I'm not talking about with ...:
>
>
> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>  typreceive => 'Xrecv', typsend => 'Xsend', ... },

-1 for trying to automate this.  It varies between fooin and foo_in,
and it'll be annoying to have to remember which one happens
automatically and which one needs an override.

> If we changed pg_proc.h:
>
> /* procedure source text */
> -   textprosrc BKI_FORCE_NOT_NULL;
> +   textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
>
> we could remove the prosrc field from many of the records, which would
> do a better job of calling attention to the remaining records where the
> C function name differs from the SQL function name.

That one I kinda like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: a way forward on bootstrap data

2018-04-25 Thread Mark Dilger


> On Apr 25, 2018, at 1:00 PM, Robert Haas  wrote:
> 
> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger  wrote:
>> There still seems to be a lot of boilerplate in the .dat files
>> that could be eliminated.  Tom mentioned upthread that he did
>> not want too much magic in genbki.pl or Catalog.pm, but I think
>> I can propose putting some magic in the header files themselves.
>> 
>> Take, for example, some of the fields in pg_type.dat.  I'll elide
>> the ones I'm not talking about with ...:
>> 
>> 
>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>> typreceive => 'Xrecv', typsend => 'Xsend', ... },
> 
> -1 for trying to automate this.  It varies between fooin and foo_in,
> and it'll be annoying to have to remember which one happens
> automatically and which one needs an override.

That may be a good argument.  I was on the fence about it, because I
like the idea that the project might do some cleanup and standardize
the names of all in/out/send/recv functions so that no overrides would
be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
be named in a standard way.)

Would that be an API break and hence a non-starter?

> 
>> If we changed pg_proc.h:
>> 
>>/* procedure source text */
>> -   textprosrc BKI_FORCE_NOT_NULL;
>> +   textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
>> 
>> we could remove the prosrc field from many of the records, which would
>> do a better job of calling attention to the remaining records where the
>> C function name differs from the SQL function name.
> 
> That one I kinda like.

Yeah, that one is simpler, and it is the one that makes the majority of
the difference in bringing down the size of the .dat files.


mark

Re: WIP: a way forward on bootstrap data

2018-04-25 Thread Tom Lane

Robert Haas  writes:
> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger  wrote:
>> There still seems to be a lot of boilerplate in the .dat files
>> that could be eliminated. ...

>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>> typreceive => 'Xrecv', typsend => 'Xsend', ... },

> -1 for trying to automate this.  It varies between fooin and foo_in,
> and it'll be annoying to have to remember which one happens
> automatically and which one needs an override.

Yeah, that was my first reaction to that example as well.  If it
weren't so nearly fifty-fifty then we might have more traction there;
but it's pretty close, and actually the foo_in cases outnumber fooin
if I counted correctly.  (Array entries should be ignored for this
purpose; maybe we'll autogenerate them someday.)

>> +   textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;

> That one I kinda like.

Agreed, this seems more compelling.  However, I think we need more than
one compelling example to justify the additional infrastructure.  There
aren't that many places where there's obvious internal redundancy in
single catalog rows, so I'm not sure that we're going to find a lot of win
here.  (The prosrc-from-proname case, in isolation, could be handled about
as well by adding a hardwired rule like we have for pronargs.)

I don't especially like the idea of trying to compute, for instance,
typalign from typlen.  That's mostly going to encourage people to overlook
it, which isn't a good thing, because there are plenty of places where
genbki.pl couldn't be expected to get it right.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-25 Thread Tom Lane

Mark Dilger  writes:
>> On Apr 25, 2018, at 1:00 PM, Robert Haas  wrote:
>> -1 for trying to automate this.  It varies between fooin and foo_in,
>> and it'll be annoying to have to remember which one happens
>> automatically and which one needs an override.

> That may be a good argument.  I was on the fence about it, because I
> like the idea that the project might do some cleanup and standardize
> the names of all in/out/send/recv functions so that no overrides would
> be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
> be named in a standard way.)

> Would that be an API break and hence a non-starter?

Yeah, I'm afraid so.  I'm pretty sure I recall cases where people
invoked I/O functions by name, and I definitely recall cases where
operator-implementation functions were invoked by name.  Those might
not be very bright things to do, but people do 'em.

We'd also be risking creating problems for individual apps by conflicting
with user-defined functions that we didn't use to conflict with.  We take
that chance every time we add functionality, of course, but usually we can
soothe complaints by pointing out that they got some new functionality in
return.  I don't think "we made I/O function names more uniform" is going
to be a very satisfactory excuse for breakage.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-04-25 Thread Mark Dilger

> On Apr 25, 2018, at 1:31 PM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger  wrote:
>>> There still seems to be a lot of boilerplate in the .dat files
>>> that could be eliminated. ...
> 
>>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>>> typreceive => 'Xrecv', typsend => 'Xsend', ... },
> 
>> -1 for trying to automate this.  It varies between fooin and foo_in,
>> and it'll be annoying to have to remember which one happens
>> automatically and which one needs an override.
> 
> Yeah, that was my first reaction to that example as well.  If it
> weren't so nearly fifty-fifty then we might have more traction there;
> but it's pretty close, and actually the foo_in cases outnumber fooin
> if I counted correctly.  (Array entries should be ignored for this
> purpose; maybe we'll autogenerate them someday.)

Part of the problem right now is that nothing really encourages new
functions to be named foo_in vs. fooin, so the nearly 50/50 split will
continue when new code is contributed.  If the system forced you to
specify the name when you did it in a nonstandard way, and not otherwise,
that would have the effect of documenting which way is now considered
standard.

> 
>>> +   textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
> 
>> That one I kinda like.
> 
> Agreed, this seems more compelling.  However, I think we need more than
> one compelling example to justify the additional infrastructure.

I'll look for more.

>  There
> aren't that many places where there's obvious internal redundancy in
> single catalog rows, so I'm not sure that we're going to find a lot of win
> here.  (The prosrc-from-proname case, in isolation, could be handled about
> as well by adding a hardwired rule like we have for pronargs.)

Right, but doing that in genbki.pl or Catalog.pm is an obfuscation.  Doing
it in pg_proc.h makes is much more reasonable, I think.  Note that my
patch does not hardcode the logic in the perl code.  It teaches the perl
code how to do variable substitution, but then allows the actual rules
for each row to be written in the header file itself.

> 
> I don't especially like the idea of trying to compute, for instance,
> typalign from typlen.  That's mostly going to encourage people to overlook
> it, which isn't a good thing, because there are plenty of places where
> genbki.pl couldn't be expected to get it right.

Well, it wouldn't be in genbki.pl, it would be in pg_type.h, but I take
your broader point that you don't want this calculated.

Re: WIP: a way forward on bootstrap data

2018-04-25 Thread Mark Dilger


>> 
 +   textprosrc BKI_DEFAULT("${proname}") 
 BKI_FORCE_NOT_NULL;
>> 
>>> That one I kinda like.
>> 
>> Agreed, this seems more compelling.  However, I think we need more than
>> one compelling example to justify the additional infrastructure.
> 
> I'll look for more.

There are two more that seem reasonable optimizations in pg_cast.h:

diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 7f4a25b2da..98794df7f8 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -37,13 +37,13 @@ CATALOG(pg_cast,2605,CastRelationId)
Oid casttarget BKI_LOOKUP(pg_type);
 
/* cast function; 0 = binary coercible */
-   Oid castfunc BKI_LOOKUP(pg_proc);
+   Oid castfunc 
BKI_DEFAULT("${casttarget}(${castsource})") BKI_LOOKUP(pg_proc);
 
/* contexts in which cast can be used */
charcastcontext;
 
/* cast method */
-   charcastmethod;
+   charcastmethod BKI_DEFAULT("/${castfunc} == 0 ? 'b' : 
'f'/e");
 } FormData_pg_cast;
 
 /* 

Which would convert numerous lines like:

{ castsource => 'money', casttarget => 'numeric', castfunc => 'numeric(money)',
  castcontext => 'a', castmethod => 'f' },

To shorter lines like:

{ castsource => 'money', casttarget => 'numeric', castcontext => 'a' },

which is great, because all you really are trying to tell the postgres system is
that when you cast from numeric to money it has to be an explicit assignment and
not an implicit cast.  The extra stuff about castfunc and castmethod just gets 
in
the way of understanding what is being done.



There are another two in pg_opclass.h:

diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index b980327fc0..9f528f97c0 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -52,7 +52,7 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId)
Oid opcmethod BKI_LOOKUP(pg_am);
 
/* name of this opclass */
-   NameDataopcname;
+   NameDataopcname BKI_DEFAULT("${opcintype}_ops");
 
/* namespace of this opclass */
Oid opcnamespace BKI_DEFAULT(PGNSP);
@@ -61,7 +61,7 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId)
Oid opcowner BKI_DEFAULT(PGUID);
 
/* containing operator family */
-   Oid opcfamily BKI_LOOKUP(pg_opfamily);
+   Oid opcfamily 
BKI_DEFAULT("${opcmethod}/${opcintype}_ops") BKI_LOOKUP(pg_opfamily);
 
/* type of data indexed by opclass */
Oid opcintype BKI_LOOKUP(pg_type);

Which would convert numerous lines like the following line from pg_opclass.dat:

{ opcmethod => 'btree', opcname => 'bytea_ops', opcfamily => 'btree/bytea_ops',
  opcintype => 'bytea' },

to much easier to read lines like:

{ opcmethod => 'btree', opcintype => 'bytea' },

which is also great, because you're really just declaring that type bytea has a
btree opclass and letting the system do the rest of the work.  Having to 
manually
specify opfamily and the opname just clutters the row and makes it less 
intuitive.


There are a bunch more of varying quality, depending on which automations
you like or don't like.


mark

Re: WIP: a way forward on bootstrap data

2018-04-26 Thread Ashutosh Bapat

On Thu, Apr 26, 2018 at 2:11 AM, Mark Dilger  wrote:
>
>> On Apr 25, 2018, at 1:31 PM, Tom Lane  wrote:
>>
>> Robert Haas  writes:
>>> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger  
>>> wrote:
 There still seems to be a lot of boilerplate in the .dat files
 that could be eliminated. ...
>>
 {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
 typreceive => 'Xrecv', typsend => 'Xsend', ... },
>>
>>> -1 for trying to automate this.  It varies between fooin and foo_in,
>>> and it'll be annoying to have to remember which one happens
>>> automatically and which one needs an override.
>>
>> Yeah, that was my first reaction to that example as well.  If it
>> weren't so nearly fifty-fifty then we might have more traction there;
>> but it's pretty close, and actually the foo_in cases outnumber fooin
>> if I counted correctly.  (Array entries should be ignored for this
>> purpose; maybe we'll autogenerate them someday.)
>
> Part of the problem right now is that nothing really encourages new
> functions to be named foo_in vs. fooin, so the nearly 50/50 split will
> continue when new code is contributed.  If the system forced you to
> specify the name when you did it in a nonstandard way, and not otherwise,
> that would have the effect of documenting which way is now considered
> standard.
>

FWIW, I would like some standard naming convention one way or the
other for in/out function names. Looking up pg_type.h for in/out
functions when you know the built-in type name isn't great. But that
itself may not be enough reason to change it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: WIP: a way forward on bootstrap data

2018-04-26 Thread David Fetter

On Wed, Apr 25, 2018 at 04:39:42PM -0400, Tom Lane wrote:
> Mark Dilger  writes:
> >> On Apr 25, 2018, at 1:00 PM, Robert Haas  wrote:
> >> -1 for trying to automate this.  It varies between fooin and foo_in,
> >> and it'll be annoying to have to remember which one happens
> >> automatically and which one needs an override.
> 
> > That may be a good argument.  I was on the fence about it, because I
> > like the idea that the project might do some cleanup and standardize
> > the names of all in/out/send/recv functions so that no overrides would
> > be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
> > be named in a standard way.)
> 
> > Would that be an API break and hence a non-starter?
> 
> Yeah, I'm afraid so.  I'm pretty sure I recall cases where people
> invoked I/O functions by name, and I definitely recall cases where
> operator-implementation functions were invoked by name.  Those might
> not be very bright things to do, but people do 'em.
> 
> We'd also be risking creating problems for individual apps by conflicting
> with user-defined functions that we didn't use to conflict with.  We take
> that chance every time we add functionality, of course, but usually we can
> soothe complaints by pointing out that they got some new functionality in
> return.  I don't think "we made I/O function names more uniform" is going
> to be a very satisfactory excuse for breakage.

What do you rate the chances that someone created a foo_in when
there's already a built-in fooin?  If it's low enough, we could add
all the foo_ins as aliases to fooins and then mandate that new ones be
called foo_in for future foos.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: WIP: a way forward on bootstrap data

2018-04-28 Thread John Naylor

On 4/26/18, Tom Lane  wrote:
> (The prosrc-from-proname case, in isolation, could be handled about
> as well by adding a hardwired rule like we have for pronargs.)

If we think we might go in the direction of more special-case
behavior, the attached patch more fully documents what we already do,
and does some minor refactoring to make future additions more
straightforward. I think this would even be good to do for v11.

> Robert Haas  writes:
>> -1 for trying to automate this.  It varies between fooin and foo_in,
>> and it'll be annoying to have to remember which one happens
>> automatically and which one needs an override.
>
> Yeah, that was my first reaction to that example as well.  If it
> weren't so nearly fifty-fifty then we might have more traction there;
> but it's pretty close, and actually the foo_in cases outnumber fooin
> if I counted correctly.  (Array entries should be ignored for this
> purpose; maybe we'll autogenerate them someday.)

Hmm, that wouldn't be too hard. Add a new metadata field called
'array_type_oid', then if it finds such an OID, teach genbki.pl to
construct a tuple for the corresponding array type by consulting
something like

chartypcategoryBKI_ARRAY(A);
chartypstorage  BKI_ARRAY(x);
...etc

in the header file, plus copying typalign from the element type. I'll
whip this up sometime and add it to the next commitfest.

On 4/26/18, Mark Dilger  wrote:
> /* cast method */
> -   charcastmethod;
> +   charcastmethod BKI_DEFAULT("/${castfunc} == 0 ? 'b' : 
> 'f'/e");
>  } FormData_pg_cast;

I don't have a strong opinion about your simple substitution
mechanism, but I find the above a bit harder to reason about. It also
has the added disadvantage that there is whitespace in the BKI
annotation, so it can no longer be parsed with the current setup. That
could be overcome with additional complexity, of course, but then
you're back in the position of advocating that for a single use case.

-John Naylor
From 0bec30078080730a8f1adf0e54060d60d6fc3015 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 28 Apr 2018 15:54:14 +0700
Subject: [PATCH 1/1] Clarify special-case bootstrap values.

Take special-case computation of out of the loop over all columns.
This makes the code a bit more clear. It doesn't make much of a
difference now, but if many columns are computed, it becomes
difficult to read.

Also add more comments.
---
 src/backend/catalog/Catalog.pm   | 22 ++
 src/include/catalog/pg_proc.dat  |  3 +++
 src/include/catalog/reformat_dat_file.pl | 13 ++---
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 67d1719..a6f555c 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -292,6 +292,20 @@ sub AddDefaultValues
 	my ($row, $schema, $catname) = @_;
 	my @missing_fields;
 
+	# Compute special-case field values.
+	# Note: If you add new cases here, you must also teach
+	# strip_default_values() in include/catalog/reformat_dat_file.pl
+	# to delete them.
+	if ($catname eq 'pg_proc')
+	{
+		# pg_proc.pronargs can be derived from proargtypes.
+		if (defined $row->{proargtypes})
+		{
+			my @proargtypes = split /\s+/, $row->{proargtypes};
+			$row->{pronargs} = scalar(@proargtypes);
+		}
+	}
+
 	foreach my $column (@$schema)
 	{
 		my $attname = $column->{name};
@@ -305,14 +319,6 @@ sub AddDefaultValues
 		{
 			$row->{$attname} = $column->{default};
 		}
-		elsif ($catname eq 'pg_proc'
-			&& $attname eq 'pronargs'
-			&& defined($row->{proargtypes}))
-		{
-			# pg_proc.pronargs can be derived from proargtypes.
-			my @proargtypes = split /\s+/, $row->{proargtypes};
-			$row->{$attname} = scalar(@proargtypes);
-		}
 		else
 		{
 			# Failed to find a value.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f643f56..be37fb0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -19,6 +19,9 @@
 # duplicate the operator's comment.)  initdb will supply suitable default
 # comments for functions referenced by pg_operator.
 
+# Note: pronargs is computed when this file is read, so does not need to be
+# specified here.  See AddDefaultValues() in Catalog.pm.
+
 # Try to follow the style of existing functions' comments.
 # Some recommended conventions:
 
diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl
index 8ebbec6..c807d3a 100644
--- a/src/include/catalog/reformat_dat_file.pl
+++ b/src/include/catalog/reformat_dat_file.pl
@@ -194,14 +194,13 @@ sub strip_default_values
 		{
 			delete $row->{$attname};
 		}
+	}
 
-		# Also delete pg_proc.pronargs, since that can be recomputed.
-		if (   $catname eq 'pg_proc'
-			&& $attname eq 'pronargs'
-			&& defined($row->{proargtypes}))
-		{
-			delete $row->{$attname};
-		}
+	# Delete computed values. See AddDefaultValues() in Catal

Re: WIP: a way forward on bootstrap data

2018-04-28 Thread Tom Lane

John Naylor  writes:
> On 4/26/18, Tom Lane  wrote:
>> (The prosrc-from-proname case, in isolation, could be handled about
>> as well by adding a hardwired rule like we have for pronargs.)

> If we think we might go in the direction of more special-case
> behavior, the attached patch more fully documents what we already do,
> and does some minor refactoring to make future additions more
> straightforward. I think this would even be good to do for v11.

Agreed, pushed.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-05-06 Thread Mark Dilger

Hackers,

Have you already considered and rejected the idea of having
genbki.pl/Catalog.pm define constants that can be used in 
the catalog .dat files?  I'm mostly curious if people think
the resulting .dat files are better or worse using constants
of this sort.  For example:

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 7497d9cd9f..58ce24adf0 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -250,6 +250,21 @@ sub ParseData

if ($lcnt == $rcnt)
{
+   # pg_cast constants for castcontext
+   use constant IMPLICIT => 'i';
+   use constant ASSIGNMENT => 'a';
+   use constant EXPLICIT => 'e';
+
+   # pg_cast constants for castmethod
+   use constant FUNCTION => 'f';
+   use constant BINARY => 'b';
+   use constant INOUT => 'i';
+
+   # pg_proc constants for provolatile
+   use constant IMMUTABLE => 'i';
+   use constant STABLE => 's';
+   use constant VOLATILE => 'v';
+
eval '$hash_ref = ' . $_;
if (!ref $hash_ref)
{
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index cf007528fd..a4ceceb652 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -19,79 +19,79 @@
 # int2->int4->int8->numeric->float4->float8, while casts in the
 # reverse direction are assignment-only.
 { castsource => 'int8', casttarget => 'int2', castfunc => 'int2(int8)',
-  castcontext => 'a', castmethod => 'f' },
+  castcontext => ASSIGNMENT, castmethod => FUNCTION },
 { castsource => 'int8', casttarget => 'int4', castfunc => 'int4(int8)',
-  castcontext => 'a', castmethod => 'f' },
+  castcontext => ASSIGNMENT, castmethod => FUNCTION },
 { castsource => 'int8', casttarget => 'float4', castfunc => 'float4(int8)',

Re: WIP: a way forward on bootstrap data

2018-05-06 Thread Tom Lane

Mark Dilger  writes:
> Have you already considered and rejected the idea of having
> genbki.pl/Catalog.pm define constants that can be used in 
> the catalog .dat files?  I'm mostly curious if people think
> the resulting .dat files are better or worse using constants
> of this sort.  For example:

Hm, I don't think it's been debated in exactly these terms; but I find
myself a bit skeptical of the proposal as-is.  At least going by your
examples, it'd make the .dat files a good bit more verbose, which doesn't
seem like what we want.  Also it creates a bigger impedance mismatch
between what you see in the .dat files and what you see in the actual
catalogs, inviting confusion.

Also, with the particular implementation technique you suggest here,
there'd be a single namespace for the constants across all catalogs,
which creates a lot of possibilities for conflicts and mistakes.
We could fix that by instituting some sort of unique-ifying naming
convention, say CAST_IMPLICIT not just IMPLICIT, but that makes the
verbosity problem worse.

I think if we wanted to have something along this line, my preferred
approach would be to continue to write the entries as string literals:

castcontext => 'ASSIGNMENT', castmethod => 'FUNCTION' },

and make it the responsibility of genbki.pl to convert to the values
recognized by the backend.  That way the conversion could happen on
a per-column basis, eliminating the conflict issue.  But I'm not
really convinced that this'd be an improvement over where we are now.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-05-06 Thread John Naylor

On 5/7/18, Mark Dilger  wrote:
> Hackers,
>
> Have you already considered and rejected the idea of having
> genbki.pl/Catalog.pm define constants that can be used in
> the catalog .dat files?  I'm mostly curious if people think
> the resulting .dat files are better or worse using constants
> of this sort.  For example:
...
> +   # pg_cast constants for castcontext
> +   use constant IMPLICIT => 'i';
> +   use constant ASSIGNMENT => 'a';
> +   use constant EXPLICIT => 'e';

The comment refers to pg_cast, but these constants apply globally.
It's also not the right place from a maintainability perspective, and
if it was, now these values have different macros defined in two
places. This is not good.

> -  castcontext => 'a', castmethod => 'f' },
> +  castcontext => ASSIGNMENT, castmethod => FUNCTION },

For one, this breaks convention that the values are always
single-quoted. If you had a use case for something like this, I would
instead use the existing lookup infrastructure and teach genbki.pl to
parse the enums (or #defines as the case might be) in the relevant
header file. You'd need some improvement in readability to justify
that additional code, though. I don't think this example quite passes
(it's pretty obvious locally what the letters refer to), but others
may feel differently.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-05-06 Thread Mark Dilger


> On May 6, 2018, at 12:08 PM, John Naylor  wrote:
> 
> On 5/7/18, Mark Dilger  wrote:
>> Hackers,
>> 
>> Have you already considered and rejected the idea of having
>> genbki.pl/Catalog.pm define constants that can be used in
>> the catalog .dat files?  I'm mostly curious if people think
>> the resulting .dat files are better or worse using constants
>> of this sort.  For example:
> ...
>> +   # pg_cast constants for castcontext
>> +   use constant IMPLICIT => 'i';
>> +   use constant ASSIGNMENT => 'a';
>> +   use constant EXPLICIT => 'e';
> 
> The comment refers to pg_cast, but these constants apply globally.
> It's also not the right place from a maintainability perspective, and
> if it was, now these values have different macros defined in two
> places. This is not good.
> 
>> -  castcontext => 'a', castmethod => 'f' },
>> +  castcontext => ASSIGNMENT, castmethod => FUNCTION },
> 
> For one, this breaks convention that the values are always
> single-quoted. If you had a use case for something like this, I would
> instead use the existing lookup infrastructure and teach genbki.pl to
> parse the enums (or #defines as the case might be) in the relevant
> header file. You'd need some improvement in readability to justify
> that additional code, though. I don't think this example quite passes
> (it's pretty obvious locally what the letters refer to), but others
> may feel differently.


John, Tom, thanks for the feedback.  I share your concerns about my
straw-man proposal. But

In the catalogs, 'f' usually means 'false', not 'function'.  A person
reading pg_cast.dat could see:

  castmethod => 'f'

and think that meant a binary conversion, since castmethod is false.
That's almost exactly wrong.  Hence my desire to write

  castmethod => FUNCTION

I don't have any better proposal, though.

mark

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Alvaro Herrera

Pushed 0001 with some changes of my own.  I'm afraid I created a few
conflicts for the later patches in your series; please rebase.

I don't think we introduced anything that would fail on old Perls, but
let's see what buildfarm has to say.

Others: Now is the time to raise concerns related to the proposed file
formats and tooling, so please do have a look when you have a moment.
At this stage, the proposed data format seems a good choice to me.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Tom Lane

Alvaro Herrera  writes:
> Others: Now is the time to raise concerns related to the proposed file
> formats and tooling, so please do have a look when you have a moment.
> At this stage, the proposed data format seems a good choice to me.

It's not very clear to me what the proposed data format actually is,
and I don't really want to read several hundred KB worth of patches
in order to reverse-engineer that information.  Nor do I see
anything in the patch list that obviously looks like it updates
doc/src/sgml/bki.sgml to explain things.

So could we have an explanation of what it is we're agreeing to?

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread David Fetter

On Fri, Jan 12, 2018 at 11:38:54AM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Others: Now is the time to raise concerns related to the proposed
> > file formats and tooling, so please do have a look when you have a
> > moment.  At this stage, the proposed data format seems a good
> > choice to me.
> 
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information.  Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.
> 
> So could we have an explanation of what it is we're agreeing to?

That would be awesome.  A walk-through example or two would also help.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Others: Now is the time to raise concerns related to the proposed file
> > formats and tooling, so please do have a look when you have a moment.
> > At this stage, the proposed data format seems a good choice to me.
> 
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information.  Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.
> 
> So could we have an explanation of what it is we're agreeing to?

Here's a small sample pg_proc entry:

 { oid => '2147', descr => 'number of input rows for which the input expression 
is not null',
  n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', 
s => 'aggregate_dummy' },

An pg_amop entry:
{ opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => 
'<(int2,int2)', am => 'btree' },

Notes:
1. this is Perl data; it is read with 'eval' without any external modules.
2. the pg_proc entry has been compressed to two lines, to avoid
   content-free lines that would easily confuse git merge, but keep line
   length reasonable.
3. references to objects in other catalogs are by name, such as "int8"
   or "btree/integer_ops" rather than OID.
4. for each attribute, an abbreviation can be declared.  In the
   pg_proc sample we have "n" which stands for proname, because we have
   this line:
   +   NameDataproname BKI_ABBREV(n);

I think John has gone overboard with some of these choices, but we can
argue the specific choices once we decide that abbreviation is a good
idea.  (Prior discussion seems to suggest we already agreed on that.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> So could we have an explanation of what it is we're agreeing to?

> Here's a small sample pg_proc entry:

>  { oid => '2147', descr => 'number of input rows for which the input 
> expression is not null',
>   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 
> 'any', s => 'aggregate_dummy' },

> An pg_amop entry:
> { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => 
> '<(int2,int2)', am => 'btree' },

> Notes:
> 1. this is Perl data; it is read with 'eval' without any external modules.

Check.  Where is it coming from --- I suppose we aren't going to try to
store this in the existing .h files?  What provisions will there be for
comments?

> 2. the pg_proc entry has been compressed to two lines, to avoid
>content-free lines that would easily confuse git merge, but keep line
>length reasonable.

Seems like we would almost need a per-catalog convention on how to lay out
the entries, or else we're going to end up (over time) with lots of cowboy
coding leading to entries that look randomly different from the ones
around them.

> 3. references to objects in other catalogs are by name, such as "int8"
>or "btree/integer_ops" rather than OID.

+1

> 4. for each attribute, an abbreviation can be declared.  In the
>pg_proc sample we have "n" which stands for proname, because we have
>this line:
>+   NameDataproname BKI_ABBREV(n);

I think single-letter abbreviations here are a pretty bad space vs
readability tradeoff, particularly for wider catalogs where it risks
ambiguity.  The pg_amop sample you show looks noticeably more legible than
the other one.  Still, this is something we can debate on a case-by-case
basis, it's not a defect in the mechanism.

One other question is how we'll verify the conversion.  Is there an
expectation that the .bki file immediately after the conversion will
be identical to immediately before?

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread John Naylor

Tom, everyone,
It's getting late in my timezone, but I wanted to give a few quick
answers. I'll follow up tomorrow. Thanks Alvaro for committing my
refactoring of pg_attribute data creation. I think your modifications
are sensible and I'll rebase soon.

On 1/13/18, Tom Lane  wrote:
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information.  Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.

Alvaro gave a good overview, so I'll just point out a few things.

-Patches 0002 through 0007 represent a complete one-to-one migration
of data entries. I didn't see much in bki.sgml specific to the current
format, so my documentation changes are confined largely to the
README, in patch 0005.
-Patches 0008 and 0009 implement techniques to make the data lines
shorter. My choices are certainly debatable. There is a brief addition
to the README in patch 0008. The abbreviation technique was only used
in three catalogs to demonstrate.
-Patches 0010 and 0011 implement human-readable OID references.
-Patches 0012 and 0013 are cosmetic, but invasive.

> Seems like we would almost need a per-catalog convention on how to lay out
> the entries, or else we're going to end up (over time) with lots of cowboy
> coding leading to entries that look randomly different from the ones
> around them.

If I understand your concern correctly, the convention is enforced by
a script (rewrite_dat.pl). At the very least this would be done at the
same time as pg_indent and perltidy. To be sure, because of default
values many entries will look randomly different from the ones around
them regardless. I have a draft patch to load the source data into
tables for viewing, but it's difficult to rebase, so I thought I'd
offer that enhancement later.

> One other question is how we'll verify the conversion.  Is there an
> expectation that the .bki file immediately after the conversion will
> be identical to immediately before?

Not identical. First, as part of the base migration, I stripped almost
all double quotes from the data entries since the new Perl hash values
are already single-quoted. (The exception is macros expanded by
initdb.c) I made genbki.pl add quotes on output to match what
bootscanner.l expects. Where a simple rule made it possible, it also
matches the original .bki. The new .bki will only diff where the
current data has superfluous quotes. (ie. "0", "sql"). Second, if the
optional cosmetic patch 0013 is applied, the individual index and
toast commands will be in a different order.

> Check.  Where is it coming from --- I suppose we aren't going to try to
> store this in the existing .h files?  What provisions will there be for
> comments?

Yes, they're in ".dat" files. Perl comments (#) on their own line are
supported. I migrated all existing comments from the header files as
part of the conversion. This is scripted, so I can rebase over new
catalog entries that get committed.

> I think single-letter abbreviations here are a pretty bad space vs
> readability tradeoff, particularly for wider catalogs where it risks
> ambiguity.

Ironically, I got that one from you [1] ;-), but if you have a
different opinion upon seeing concrete, explicit examples, I think
that's to be expected.

--
Now is probably a good time to disclose concerns of my own:
1. MSVC dependency tracking is certainly broken until such time as I
can shave that yak and test.
2. Keeping the oid symbols with the data entries required some
Makefile trickery to make them visible to .c files outside the backend
(patch 0007). It builds fine, but the dependency tracking might have
bugs.

--
[1] https://www.postgresql.org/message-id/15697.1479161432%40sss.pgh.pa.us

Thanks,
John Naylor

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Peter Eisentraut

On 1/12/18 12:24, Alvaro Herrera wrote:
> Here's a small sample pg_proc entry:
> 
>  { oid => '2147', descr => 'number of input rows for which the input 
> expression is not null',
>   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 
> 'any', s => 'aggregate_dummy' },
> 
> An pg_amop entry:
> { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => 
> '<(int2,int2)', am => 'btree' },
> 
> Notes:
> 1. this is Perl data; it is read with 'eval' without any external modules.
> 2. the pg_proc entry has been compressed to two lines, to avoid
>content-free lines that would easily confuse git merge, but keep line
>length reasonable.

I don't think I like this.  I know pg_proc.h is a pain to manage, but at
least right now it's approachable programmatically.  I recently proposed
to patch to replace the columns proisagg and proiswindow with a combined
column prokind.  I could easily write a small Perl script to make that
change in pg_proc.h, because the format is easy to parse and has one
line per entry.  With this new format, that approach would no longer
work, and I don't know what would replace it.

> 3. references to objects in other catalogs are by name, such as "int8"
>or "btree/integer_ops" rather than OID.

I think we could already do this by making more use of things like
regtype and regproc.  That should be an easy change to make.

> 4. for each attribute, an abbreviation can be declared.  In the
>pg_proc sample we have "n" which stands for proname, because we have
>this line:
>+   NameDataproname BKI_ABBREV(n);

I'm afraid a key value system would invite writing the attributes in
random order and create a mess over time.

But if we want to do it, I think we could also add it to the current BKI
format.  The same goes for defining default values for some columns.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread David Fetter

On Fri, Jan 12, 2018 at 04:22:26PM -0500, Peter Eisentraut wrote:
> On 1/12/18 12:24, Alvaro Herrera wrote:
> > Here's a small sample pg_proc entry:
> > 
> >  { oid => '2147', descr => 'number of input rows for which the input 
> > expression is not null',
> >   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 
> > 'any', s => 'aggregate_dummy' },
> > 
> > An pg_amop entry:
> > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper 
> > => '<(int2,int2)', am => 'btree' },
> > 
> > Notes:
> > 1. this is Perl data; it is read with 'eval' without any external modules.
> > 2. the pg_proc entry has been compressed to two lines, to avoid
> >content-free lines that would easily confuse git merge, but keep line
> >length reasonable.
> 
> I don't think I like this.  I know pg_proc.h is a pain to manage,
> but at least right now it's approachable programmatically.  I
> recently proposed to patch to replace the columns proisagg and
> proiswindow with a combined column prokind.  I could easily write a
> small Perl script to make that change in pg_proc.h, because the
> format is easy to parse and has one line per entry.  With this new
> format, that approach would no longer work, and I don't know what
> would replace it.

How about ingesting with Perl, manipulating there, and spitting back
out as Perl data structures?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 1/12/18 12:24, Alvaro Herrera wrote:
> > Here's a small sample pg_proc entry:
> > 
> >  { oid => '2147', descr => 'number of input rows for which the input 
> > expression is not null',
> >   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 
> > 'any', s => 'aggregate_dummy' },
> > 
> > An pg_amop entry:
> > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper 
> > => '<(int2,int2)', am => 'btree' },
> > 
> > Notes:
> > 1. this is Perl data; it is read with 'eval' without any external modules.
> > 2. the pg_proc entry has been compressed to two lines, to avoid
> >content-free lines that would easily confuse git merge, but keep line
> >length reasonable.
> 
> I don't think I like this.  I know pg_proc.h is a pain to manage, but at
> least right now it's approachable programmatically.  I recently proposed
> to patch to replace the columns proisagg and proiswindow with a combined
> column prokind.  I could easily write a small Perl script to make that
> change in pg_proc.h, because the format is easy to parse and has one
> line per entry.  With this new format, that approach would no longer
> work, and I don't know what would replace it.

The idea in my mind is that you'd write a Perl program to do such
changes, yeah.  If the code we supply contains enough helpers and a few
samples, it should be reasonably simple for people that don't do much
Perl.

The patch series does contain a few helper programs to write the data
files.  I haven't looked in detail what can they do and what they cannot.

> > 3. references to objects in other catalogs are by name, such as "int8"
> >or "btree/integer_ops" rather than OID.
> 
> I think we could already do this by making more use of things like
> regtype and regproc.  That should be an easy change to make.

Well, that assumes we *like* the current format, which I think is not a
given ... more the opposite.

> > 4. for each attribute, an abbreviation can be declared.  In the
> >pg_proc sample we have "n" which stands for proname, because we have
> >this line:
> >+   NameDataproname BKI_ABBREV(n);
> 
> I'm afraid a key value system would invite writing the attributes in
> random order and create a mess over time.


Yeah, I share this concern.  But you could fix it if the Perl tooling to
write these files had a hardcoded list to work with.  Maybe we could put
it in a declaration of sorts at the start of each data file.

> But if we want to do it, I think we could also add it to the current BKI
> format.  The same goes for defining default values for some columns.

As above -- do we really like our current format so much that we're
satisfied with minor tweaks?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Andres Freund

On 2018-01-12 18:36:14 -0300, Alvaro Herrera wrote:
> As above -- do we really like our current format so much that we're
> satisfied with minor tweaks?

A definite no from here.  I think especially pg_proc desperately needs
something key=value like to be understandable, and that very clearly
seems to be something we can't do in the current format.

- Andres

Re: WIP: a way forward on bootstrap data

2018-01-12 Thread Tom Lane

Alvaro Herrera  writes:
> Peter Eisentraut wrote:
>> I don't think I like this.  I know pg_proc.h is a pain to manage, but at
>> least right now it's approachable programmatically.  I recently proposed
>> to patch to replace the columns proisagg and proiswindow with a combined
>> column prokind.  I could easily write a small Perl script to make that
>> change in pg_proc.h, because the format is easy to parse and has one
>> line per entry.  With this new format, that approach would no longer
>> work, and I don't know what would replace it.

> The idea in my mind is that you'd write a Perl program to do such
> changes, yeah.  If the code we supply contains enough helpers and a few
> samples, it should be reasonably simple for people that don't do much
> Perl.

It would be good to see a sample --- for a concrete example, how about
creating a Perl script to do the conversion Peter mentions?

>>> 3. references to objects in other catalogs are by name, such as "int8"
>>> or "btree/integer_ops" rather than OID.

>> I think we could already do this by making more use of things like
>> regtype and regproc.  That should be an easy change to make.

> Well, that assumes we *like* the current format, which I think is not a
> given ... more the opposite.

Note that we *can't* easily improve that given the current tooling,
mainly because the bootstrap-time capabilities of regproc_in et al are
so limited.  We don't even have regxxx types for many of the other
cross-reference columns like opclass references, and I don't think
I want to build them because they'd also have bootstrapping issues.

According to my understanding, part of what's going on here is that
we're going to teach genbki.pl to parse these object references and
convert them to hard-coded OIDs in the emitted BKI file.  That seems
good to me, but one thing we're going to need is a spec for how
genbki.pl knows what to do.

>> I'm afraid a key value system would invite writing the attributes in
>> random order and create a mess over time.

> Yeah, I share this concern.  But you could fix it if the Perl tooling to
> write these files had a hardcoded list to work with.  Maybe we could put
> it in a declaration of sorts at the start of each data file.

This is more or less the same concern I stated upthread.  But the
impression I'm getting is that we expect these files to often be written
out from a Perl script, so it's mostly a question of how we teach the
Perl scripts to emit stylistically consistent data.  Then we can use the
Perl scripts as a kind of pgindent for this data.

>> But if we want to do it, I think we could also add it to the current BKI
>> format.  The same goes for defining default values for some columns.

> As above -- do we really like our current format so much that we're
> satisfied with minor tweaks?

I'm sure not.  This will be a big change, without a doubt, but I think
we'll end up in a better place.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-01-13 Thread John Naylor

On 1/13/18, Peter Eisentraut  wrote:

> I'm afraid a key value system would invite writing the attributes in
> random order and create a mess over time.

A developer can certainly write them in random order, and it will
still work. However, in patch 0002 I have a script to enforce a
standard appearance. Of course, for it to work, you have to run it. I
describe it, if rather tersely, in the README changes in patch 0008.
Since several people have raised this concern, I will go into a bit
more depth here. Perhaps I should reuse some of this language for the
README to improve it.

src/include/catalog/rewrite_dat.pl knows where to find the schema of
each catalog, namely the pg_*.h header, accessed via ParseHeader() in
Catalog.pm. It writes key/value pairs in the order found in the
schema:

{ key_1 => 'value_1', key_2 => 'value_2', ..., key_n => 'value_n' }

The script also has an array of four hard-coded metadata fields: oid,
oid_symbol, descr, and shdescr. If any of these are present, they will
go on their own line first, in the order given:

{ oid => , oid_symbol => 'FOO_OID', descr => 'comment on foo',
  key_1 => 'value_1', key_2 => 'value_2', ..., key_n => 'value_n' }

> I don't think I like this.  I know pg_proc.h is a pain to manage, but at
> least right now it's approachable programmatically.  I recently proposed
> to patch to replace the columns proisagg and proiswindow with a combined
> column prokind.  I could easily write a small Perl script to make that
> change in pg_proc.h, because the format is easy to parse and has one
> line per entry.  With this new format, that approach would no longer
> work, and I don't know what would replace it.

I've attached four diffs/patches to walk through how you would replace
the columns proisagg and proiswindow with a combined column prokind.

Patch 01: Add new prokind column to pg_proc.h, with a default of 'n'.
In many cases, this is all you would have to do, as far as
bootstrapping is concerned.

Diff 02: This is a one-off script diffed against rewrite_dat.pl. In
rewrite_dat.pl, I have a section with this comment, and this is where
I put the one-off code:

# Note: This is also a convenient place to do one-off
# bulk-editing.

(I haven't documented this with explicit examples, so I'll have to remedy that)

You would run it like this:

cd src/include/catalog
perl -I ../../backend/catalog/  rewrite_dat_with_prokind.pl  pg_proc.dat

While reading pg_proc.dat, the default value for prokind is added
automatically. We inspect proisagg and proiswindow, and change prokind
accordingly. pg_proc.dat now has all three columns, prokind, proisagg,
and proiswindow.

Patch 03: Remove old columns from pg_proc.h

Now we run the standard rewrite:

perl -I ../../backend/catalog/ rewrite_dat.pl pg_proc.dat

Any values not found in the schema will simply not be written to
pg_proc.dat, so the old columns are now gone.

The result is found in patch 04.
--

Note: You could theoretically also load the source data into tables,
do the updates with SQL, and dump back out again. I made some progress
with this method, but it's not complete. I think the load and dump
steps add too much complexity for most use cases, but it's a
possibility.

-John Naylor
--- rewrite_dat.pl  2018-01-13 13:32:42.528046389 +0700
+++ rewrite_dat_with_prokind.pl 2018-01-13 16:07:49.124401213 +0700
@@ -132,6 +132,19 @@
# these operations in the order given.
# Note: This is also a convenient place to do one-off
# bulk-editing.
+
+   # One-off to migrate to prokind
+   # Default has already been filled in by now, so now add 
other
+   # values as appropriate
+   if ($values{proisagg} eq 't')
+   {
+   $values{prokind} = 'a';
+   }
+   elsif ($values{proiswindow} eq 't')
+   {
+   $values{prokind} = 'w';
+   }
+
if (!$expand_tuples)
{
strip_default_values(\%values, $schema, 
$catname);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 7b77eea..0f2990f 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -60,6 +60,9 @@ CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
 	/* is it a window function? */
 	bool		proiswindow BKI_DEFAULT(f);

+	/* kind of function: n = normal, a = agg, w = window */
+	char		prokind BKI_DEFAULT(n);
+
 	/* security definer */
 	bool		prosecdef BKI_DEFAULT(f);

diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index be8f822..b76dd72 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -57,12 +57,6 @@ CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO

Re: WIP: a way forward on bootstrap data

2018-01-13 Thread John Naylor

On 1/13/18, Tom Lane  wrote:

> According to my understanding, part of what's going on here is that
> we're going to teach genbki.pl to parse these object references and
> convert them to hard-coded OIDs in the emitted BKI file.  That seems
> good to me, but one thing we're going to need is a spec for how
> genbki.pl knows what to do.

I don't know if it qualifies as a spec, but here's my implementation:

Use dummy type aliases in the header files: regam, regoper, regopf, and regtype
These are #defined away in genbki.h:

+/* 
+ * Some columns of type Oid have human-readable entries that are
+ * resolved when creating postgres.bki.
+ * 
+ */
+#define regam Oid
+#define regoper Oid
+#define regopf Oid
+#define regtype Oid

Likewise, in genbki.pl (and I just noticed a typo, s/names/types/):

+# We use OID aliases to indicate when to do OID lookups, so column names
+# have to be turned back into 'oid' before writing the CREATE command.
+my %RENAME_REGOID = (
+   regam => 'oid',
+   regoper => 'oid',
+   regopf => 'oid',
+   regtype => 'oid');
+

When genbki.pl sees one of these type aliases, it consults the
appropriate lookup table, exactly how we do now for regproc. One
possibly dubious design point is that I declined to teach the
pg_attribute logic about this, so doing lookups in tables with schema
macros has to be done explicitly. There is only one case of this right
now, and I noted the tradeoff:

+   # prorettype
+   # Note: We could handle this automatically by 
using the
+   # 'regtype' alias, but then we would have to 
teach
+   # morph_row_for_pgattr() to change the 
attribute type back to
+   # oid. Since we have to treat pg_proc 
differently anyway,
+   # just do the type lookup manually here.
+   my $rettypeoid = $regtypeoids{ 
$bki_values{prorettype}};
+   $bki_values{prorettype} = $rettypeoid
+ if defined($rettypeoid);

This is all in patch 0011.

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-01-13 Thread Tom Lane

John Naylor  writes:
> On 1/12/18, Alvaro Herrera  wrote:
>> Pushed 0001 with some changes of my own.  I'm afraid I created a few
>> conflicts for the later patches in your series; please rebase.

> Attached, rebased over 255f14183ac.

I decided that I wasn't going to get answers about the things I cared
about without looking through the patchset, so I've now done so,
in a once-over-lightly fashion.  Here's a quick summary of what it
does, for those who may be equally confused, and then some comments
(not really a review).

The patch removes DATA and (SH)DESCR lines from all the catalog/pg_*.h
files, as well as the #defines for OID-value macros, and puts that
information into pg_*.dat files corresponding to the .h files, in a form
that is easily readable and writable by Perl scripts.  Comments associated
with this info are also transferred to the .dat files, and the scripts
that can rewrite the .dat files are able to preserve the comments.

genbki.pl is able to generate postgres.bki and other initdb input files
directly from the .dat files.  It also creates a single header file
src/include/catalog/oid_symbols.h that contains all of the OID-value
macros that were formerly in the pg_*.h files.

The patch gets rid of the need to write hard-wired OIDs when referencing
operators, opfamilies, etc in the .dat files; now you can write their
names instead.  genbki.pl will look up the names and substitute numeric
OIDs in the emitted postgres.bki file.  There are also provisions to
shorten the .dat files through the use of abbreviated field names,
default values for fields, and some other less-general techniques.

--

OK, now some comments:

I'm not sure about the decision to move all the OID macros into
one file; that seems like namespace pollution.  It's especially
odd that you did that but did not consolidate fmgroids.h with
the macros for symbols from other catalogs.  Now it's true that
we need all those symbols to be distinct, since it needs to be
possible for .c files to include any combination of pg_*.h headers,
but I don't think it's an especially good idea that you have to
include all of them or none.  Even if we're willing to put up with
that namespace pollution for backend code, there are clients that
currently include pg_*.h headers to get some of those macros, and
they're likely to be less happy about it.

The design I'd kind of imagined was one generated file of #define's
per pg_*.h file, not just one giant one.

It would be really nice, also, if the attribute number macros
(Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated.
Manually renumbering those is one of the bigger pains in the rear
when adding catalog columns.  It was less of a pain than adjusting
the DATA lines of course, so I never figured it was worth doing
something about in isolation --- but with this infrastructure in
place, that's manual work we shouldn't have to do anymore.

Another thing that I'd sort of hoped might happen from this patchset
is to cure the problem of keeping some catalog headers safe for
client-side inclusion, because some clients want the OID value macros
and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
currently have to #include those headers or else hard-code the values.
We've worked around that to date with ad-hoc solutions like splitting
function declarations out to pg_*_fn.h files, but I never liked that
much.  With the OID value macros being moved out to separate generated
file(s), there's now a possibility that we could fix this once and for all
by making client-side code include those file(s) not pg_type.h et al
themselves.  But we'd need a way to put the column-value macros into
those files too; maybe that's too messy to make it practical.

The .dat files need to have header comments that follow project
conventions, in particular they need to contain copyright statements.
Likewise for generated files.

I've got zero faith that the .h files will hold still long enough
for these patches to be reviewed and applied.  The ones that touch
significant amounts of data need to be explained as "run this script
on the current data", rather than presented as static diffs.

I'm not really thrilled by the single-purpose "magic" behaviors added
in 0007, such as computing prosrc from proname.  I think those will
add more confusion than they're worth.

In 0010, you relabel the types of some OID columns so that genbki.pl
will know which lookup to apply to them.  That's not such a problem for
the relabelings that are just macros and genbki.pl converts back to
type OID in the .bki file.  But you also did things like s/Oid/regtype/,
and that IS a problem because it will affect what client code sees in
those catalog columns.  We've discussed changing those columns to
regfoo types in the past, and decided not to, because of the likelihood
of breaking client queries.  I do not think this patch gets to change
that policy.  So the way to identify the lookup rule needs to be
independent of whether the c

Re: WIP: a way forward on bootstrap data

2018-01-14 Thread Greg Stark

I'm 1000% on board with replacing oid constants with symbolic names
that get substituted programmatically.

However I wonder why we're bothering inventing a new syntax that
doesn't actually do much more than present static tabular data. If
things like magic proname->prosrc behaviour are not valuable then
we're not getting much out of this perl-friendly syntax that a simpler
more standard format wouldn't get us.

So just as a straw man proposal What if we just replaced the data
file with a csv file that could be maintained in a spreadsheet. It
could easily be parsed by perl and we could even have perl scripts
that load the records into memory and modify them. You could even
imagine writing a postgres script that loaded the csv file into a
temporary table, did complex SQL updates or other DML, then wrote it
back out to a csv file.

Re: WIP: a way forward on bootstrap data

2018-01-14 Thread Tom Lane

Greg Stark  writes:
> I'm 1000% on board with replacing oid constants with symbolic names
> that get substituted programmatically.

Yeah, that's almost an independent feature --- we could do that without
any of this other stuff, if we wanted.

> However I wonder why we're bothering inventing a new syntax that
> doesn't actually do much more than present static tabular data. If
> things like magic proname->prosrc behaviour are not valuable then
> we're not getting much out of this perl-friendly syntax that a simpler
> more standard format wouldn't get us.

TBH, the thing that was really drawing my ire about that was that John was
inventing random special rules and documenting them *noplace* except for
the guts of some perl code.  If I have to read perl code to find out what
the catalog data means, I'm going to be bitching loudly.  That could be
done better --- one obvious idea is to add a comment to the relevant .h
file, next to the field whose value will be implicitly calculated.

> So just as a straw man proposal What if we just replaced the data
> file with a csv file that could be maintained in a spreadsheet.

Bleah --- that's no better than what we have today, just different.
And "maintained in a spreadsheet" doesn't sound attractive to me;
you'd almost certainly lose comments, for instance.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-01-14 Thread Tom Lane

I wrote:
> Another thing that I'd sort of hoped might happen from this patchset
> is to cure the problem of keeping some catalog headers safe for
> client-side inclusion, because some clients want the OID value macros
> and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
> currently have to #include those headers or else hard-code the values.
> We've worked around that to date with ad-hoc solutions like splitting
> function declarations out to pg_*_fn.h files, but I never liked that
> much.  With the OID value macros being moved out to separate generated
> file(s), there's now a possibility that we could fix this once and for all
> by making client-side code include those file(s) not pg_type.h et al
> themselves.  But we'd need a way to put the column-value macros into
> those files too; maybe that's too messy to make it practical.

I had a thought about how to do that.  It's clearly desirable that that
sort of material remain in the manually-maintained pg_*.h files, because
that's basically where you look to find out C-level details of what's
in a particular catalog.  However, that doesn't mean that that's where
the compiler has to find it.  Imagine that we write such sections of the
catalog .h files like

#ifdef EXPOSE_TO_CLIENT_CODE

/*
 * ... comment here ...
 */
#define PROVOLATILE_IMMUTABLE   'i' /* never changes for given input */
#define PROVOLATILE_STABLE  's' /* does not change within a scan */
#define PROVOLATILE_VOLATILE'v' /* can change even within a scan */

#endif /* EXPOSE_TO_CLIENT_CODE */

Like CATALOG_VARLEN, the symbol EXPOSE_TO_CLIENT_CODE is never actually
defined to the compiler.  What it does is to instruct genbki.pl to copy
the material up to the matching #endif into the generated file for this
catalog.  So, for each catalog header pg_foo.h, there would be a
generated file, say pg_foo_d.h, containing:

* Natts_ and Anum_ macros for pg_foo

* Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h

* Any OID-value macros for entries in that catalog

pg_foo.h would contain a #include for pg_foo_d.h, so that backend-side
code would obtain all these values the same as it did before.  But the
new policy for client code would be to include pg_foo_d.h *not* pg_foo.h,
and so we are freed of any worry about whether pg_foo.h has to be clean
for clients to include.  We could re-merge the various pg_foo_fn.h files
back into the main files, if we wanted.

The contents of EXPOSE_TO_CLIENT_CODE sections wouldn't necessarily
have to be just macros --- they could be anything that's safe and
useful for client code.  But that's probably the main usage.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2018-01-14 Thread John Naylor

On 1/14/18, Tom Lane  wrote:
> I'm not sure about the decision to move all the OID macros into
> one file; that seems like namespace pollution.

> The design I'd kind of imagined was one generated file of #define's
> per pg_*.h file, not just one giant one.

First, thanks for the comments. I will start incorporating them in a
few days to give others the chance to offer theirs.

I'm inclined to agree about namespace pollution. One stumbling block
is the makefile changes to allow OID symbols to be visible to
non-backend code. Assuming I took the correct approach for the single
file case, adapting that to multiple files would require some
rethinking.

> It's especially
> odd that you did that but did not consolidate fmgroids.h with
> the macros for symbols from other catalogs.

Point taken.

> It would be really nice, also, if the attribute number macros
> (Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated.
> Manually renumbering those is one of the bigger pains in the rear
> when adding catalog columns.  It was less of a pain than adjusting
> the DATA lines of course, so I never figured it was worth doing
> something about in isolation --- but with this infrastructure in
> place, that's manual work we shouldn't have to do anymore.

Searching the archives for discussion of Anum_* constants [1], I
prefer your one-time suggestion to use enums instead. I'd do that, and
then have Catalog.pm throw an error if Natts_* doesn't match the
number of columns. That's outside the scope of this patch, however.

> Another thing that I'd sort of hoped might happen from this patchset
> is to cure the problem of keeping some catalog headers safe for
> client-side inclusion, because some clients want the OID value macros
> and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
> currently have to #include those headers or else hard-code the values.
> We've worked around that to date with ad-hoc solutions like splitting
> function declarations out to pg_*_fn.h files, but I never liked that
> much.  With the OID value macros being moved out to separate generated
> file(s), there's now a possibility that we could fix this once and for all
> by making client-side code include those file(s) not pg_type.h et al
> themselves.  But we'd need a way to put the column-value macros into
> those files too; maybe that's too messy to make it practical.

To make sure I understand you correctly:
Currently we have

pg_type.h  oid symbols, column symbols
pg_type_fn.h function decls

And assuming we go with one generated oid symbol file per header, my
patch would end up with something like

pg_type.h  column symbols (#includes pg_type_sym.h)
pg_type_fn.h function decls
pg_type_sym.h oid symbols (generated)

And you're saying you'd prefer

pg_type.h  function decls (#includes pg_type_sym.h)
pg_type_sym.h oid symbols, column symbols (generated)

I agree that it'd be messy to drive the generation of the column
symbols. I'll think about it. What about

pg_type.h  function decls (#includes pg_type_sym.h)
pg_type_sym.h column symbols (static, #includes pg_type_oids.h)
pg_type_oids.h oid symbols (generated)

It's complicated, but arguably no more so than finding someplace more
distant to stick the column symbols and writing code to copy them.
It'd be about than 20 *_sym.h headers and 10 *_oids.h headers.

> The .dat files need to have header comments that follow project
> conventions, in particular they need to contain copyright statements.
> Likewise for generated files.

Okay.

> I've got zero faith that the .h files will hold still long enough
> for these patches to be reviewed and applied.  The ones that touch
> significant amounts of data need to be explained as "run this script
> on the current data", rather than presented as static diffs.

I've already rebased over a catalog change and it was not much fun, so
I'd be happy to do it this way.

> I'm not really thrilled by the single-purpose "magic" behaviors added
> in 0007, such as computing prosrc from proname.  I think those will
> add more confusion than they're worth.

Okay. I still think generating pg_type OID symbols is worth doing, but
I no longer think this is a good place to do it.

> In 0010, you relabel the types of some OID columns so that genbki.pl
> will know which lookup to apply to them.  That's not such a problem for
> the relabelings that are just macros and genbki.pl converts back to
> type OID in the .bki file.  But you also did things like s/Oid/regtype/,
> and that IS a problem because it will affect what client code sees in
> those catalog columns.  We've discussed changing those columns to
> regfoo types in the past, and decided not to, because of the likelihood
> of breaking client queries.  I do not think this patch gets to change
> that policy.  So the way to identify the lookup rule needs to be
> independent of whether the column is declared as Oid or an Oid alias type.
> Perhaps an explicit marker telli

Re: WIP: a way forward on bootstrap data

2018-01-15 Thread Alvaro Herrera

Tom Lane wrote:

> I had a thought about how to do that.  It's clearly desirable that that
> sort of material remain in the manually-maintained pg_*.h files, because
> that's basically where you look to find out C-level details of what's
> in a particular catalog.  However, that doesn't mean that that's where
> the compiler has to find it.
>
> [ elided explanation of pg_foo_d.h and pg_foo.h ]

Sounds good to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2018-01-15 Thread John Naylor

On 1/14/18, Tom Lane  wrote:
>  So, for each catalog header pg_foo.h, there would be a
> generated file, say pg_foo_d.h, containing:
>
> * Natts_ and Anum_ macros for pg_foo
>
> * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h
>
> * Any OID-value macros for entries in that catalog

I'm on board in principle, but I have some questions:

How do we have the makefiles gracefully handle 62 generated headers
which need to be visible outside the backend? Can I generalize the
approach I took for the single OIDs file I had, or is that not even
the right way to go? (In short, I used a new backend make target that
was invoked in src/common/Makefile - the details are in patch v6-0006)

If we move fmgr oid generation here as you suggested earlier, I
imagine we don't want to create a lot of #include churn. My idea is to
turn src/include/utils/fmgroids.h into a static file that just
#includes catalog/pg_proc_d.h. Thoughts?

And I'm curious, what is "_d" intended to convey?

(While I'm thinking outloud, I'm beginning to think that these headers
lie outside the scope of genbki.pl, and belong in a separate script.)

-John Naylor

Re: WIP: a way forward on bootstrap data

2018-01-15 Thread Tom Lane

John Naylor  writes:
> On 1/14/18, Tom Lane  wrote:
>> So, for each catalog header pg_foo.h, there would be a
>> generated file, say pg_foo_d.h, containing:
>> * Natts_ and Anum_ macros for pg_foo
>> * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h
>> * Any OID-value macros for entries in that catalog

> I'm on board in principle, but I have some questions:

> How do we have the makefiles gracefully handle 62 generated headers
> which need to be visible outside the backend?

There are other people around here who are better make wizards than I, but
I'm sure this is soluble.  A substitution like $(CATALOG_HEADERS:_d.h=.h)
might get you started.  (It looks like CATALOG_HEADERS would have to be
separated out of POSTGRES_BKI_SRCS, but that's easy.)

> If we move fmgr oid generation here as you suggested earlier, I
> imagine we don't want to create a lot of #include churn. My idea is to
> turn src/include/utils/fmgroids.h into a static file that just
> #includes catalog/pg_proc_d.h. Thoughts?

Yeah ... or vice versa.  I don't know if touching the way fmgroids.h is
built is worthwhile.  Certainly, if we'd built all this to begin with
we'd have unified pg_proc.h's OID macro handling with the other catalogs,
but we didn't and that might not be worth changing.  I'm not strongly
convinced either way.

> And I'm curious, what is "_d" intended to convey?

I was thinking "#define" or "data".  You could make as good a case for
"_g" for "generated", or probably some other choices.  I don't have a
strong preference; but I didn't especially like your original suggestion
of "_sym", because that seemed like it would invite confusion with
possible actual names for catalogs.  A one-letter suffix seems less
likely to conflict with anything anybody would think was a good choice
of catalog name.

> (While I'm thinking outloud, I'm beginning to think that these headers
> lie outside the scope of genbki.pl, and belong in a separate script.)

Maybe, but the conditions for regenerating these files would be exactly
the same as for the .bki file, no?  So we might as well just have one
script do both, rather than writing duplicative rules in the Makefiles
and the MSVC scripts.

regards, tom lane

Re: WIP: a way forward on bootstrap data

2017-12-13 Thread Peter Eisentraut

On 12/13/17 04:06, John Naylor wrote:
> There doesn't seem to be any interest in bootstrap data at the moment,
> but rather than give up just yet, I've added a couple features to make
> a data migration more compelling:

I took a brief look at your patches, and there appear to be a number of
good cleanups in there at least.  But could you please send patches in
git format-patch format with commit messages, so we don't have to guess
what each patch does?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2017-12-19 Thread David Fetter

On Thu, Dec 14, 2017 at 05:59:12PM +0700, John Naylor wrote:
> On 12/13/17, Peter Eisentraut  wrote:
> > On 12/13/17 04:06, John Naylor wrote:
> >> There doesn't seem to be any interest in bootstrap data at the moment,
> >> but rather than give up just yet, I've added a couple features to make
> >> a data migration more compelling:
> >
> > I took a brief look at your patches, and there appear to be a number of
> > good cleanups in there at least.  But could you please send patches in
> > git format-patch format with commit messages, so we don't have to guess
> > what each patch does?
> 
> Thanks for taking a look and for pointing me to git format-patch.
> That's much nicer than trying to keep emails straight. I've attached a
> new patchset.

Thanks for your hard work on this.  It'll really make developing this
part of the code a lot more pleasant.

Unfortunately, it no longer applies to master.  Can we get a rebase?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: WIP: a way forward on bootstrap data

2017-12-21 Thread Alvaro Herrera

Pushed 0001 and 0002.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2017-12-21 Thread Alvaro Herrera

Hmm, patch 0008 removes data lines from the .h but leaves the dependent
OID define lines around:

#define BTREE_AM_OID 403

This is not good, because then the define depends on data that appears
in a distant file.  Another consideration is that the current system has
the property that these OIDs are discoverable by a hacker by navigating
to the containing .h file; and a missing symbol is easily fixable if
they need to hardcode the OID for which there isn't a symbol yet.

Maybe a generated .h file containing defines for OIDs from all catalogs
is okay.  Of course, not all symbols are to be listed -- we should have
a special marker in the data lines for those that are.  Maybe something
like this

{ oid => '403', descr => 'b-tree index access method',
  amname => 'btree', amhandler => 'bthandler', amtype => 'i',
  cpp_symbol => 'BTREE_AM_OID' },

(where 'cpp_symbol' would be skipped by genbki explicitly).

Any better ideas?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: a way forward on bootstrap data

2017-12-21 Thread Robert Haas

On Thu, Dec 21, 2017 at 5:32 PM, Alvaro Herrera  wrote:
> Hmm, patch 0008 removes data lines from the .h but leaves the dependent
> OID define lines around:

Just a question here -- do we actually have consensus on doing the
stuff that these patches do?  Because I'm not sure we do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: a way forward on bootstrap data

2017-12-21 Thread John Naylor

On 12/22/17, Alvaro Herrera  wrote:
> Hmm, patch 0008 removes data lines from the .h but leaves the dependent
> OID define lines around:
>
> #define BTREE_AM_OID 403
>
> This is not good, because then the define depends on data that appears
> in a distant file.

I see what you mean.

> Another consideration is that the current system has
> the property that these OIDs are discoverable by a hacker by navigating
> to the containing .h file; and a missing symbol is easily fixable if
> they need to hardcode the OID for which there isn't a symbol yet.

I'm not sure I follow you here.

> Maybe a generated .h file containing defines for OIDs from all catalogs
> is okay.  Of course, not all symbols are to be listed -- we should have
> a special marker in the data lines for those that are.  Maybe something
> like this
>
> { oid => '403', descr => 'b-tree index access method',
>   amname => 'btree', amhandler => 'bthandler', amtype => 'i',
>   cpp_symbol => 'BTREE_AM_OID' },
>
> (where 'cpp_symbol' would be skipped by genbki explicitly).

The last part makes sense and would be a fairly mechanical change. I'm
not sure of the best way to include those generated symbols back in
the code again, though. I think a single file might not be desirable
because of namespace pollution. The alternative would be to have, say,
pg_am.h include pg_am_sym.h. More complex but doable. Also, no need to
skip non-data values explicitly. The code knows where to find the
schema. :-)

Thanks for pushing 1 and 2, BTW.

-John Naylor

Re: WIP: a way forward on bootstrap data

2017-12-22 Thread Alvaro Herrera

Robert Haas wrote:
> On Thu, Dec 21, 2017 at 5:32 PM, Alvaro Herrera  
> wrote:
> > Hmm, patch 0008 removes data lines from the .h but leaves the dependent
> > OID define lines around:
> 
> Just a question here -- do we actually have consensus on doing the
> stuff that these patches do?  Because I'm not sure we do.

My reading of the old threads (linked provided by John in his initial
email in this thread) is that we have a consensus that we want to get
rid of the old data representation, because it causes endless amount of
pain.  The proposed format seems to satisfy the constraints that we all
discussed, namely

1. be easier to modify than the current format,
2. in particular, allow for default values on certain columns
3. not cause git merge problems because of too similar lines in every
record
4. not require onerous Perl modules

The one thing we seem to lack is a tool to edit the data files, as you
suggested[1].  Stephen Frost mentioned[2] that we could do this by
allowing the .data files be loaded in a database table, have the changes
made via SQL, then have a way to create an updated .data file.  Tom
said[3] he didn't like that particular choice.

So we already have Catalog.pm that (after these patches) knows how to
load .data files; we could use that as a basis to enable easy oneliners
to do whatever editing is needed.

Also, the CPP symbols remaining in the pg_*.h that I commented yesterday
was already mentioned[4] before.

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoa4%3D5oz7wSMcLNLh8h6cXzPoxxNJKckkdSQA%2BzpUG0%2B0A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/20150304150712.GV29780%40tamriel.snowman.net
[3] https://www.postgresql.org/message-id/24766.1478821303%40sss.pgh.pa.us
[4] https://www.postgresql.org/message-id/15697.1479161...@sss.pgh.pa.us

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

1 2 3 >

1 - 100 of 200 matches

Mail list logo