Re: WIP: a way forward on bootstrap data
John Naylor writes: > Version 8, rebased against 76b6aa41f41d. I took a preliminary look through this, without yet attempting to execute the script against HEAD. I have a few thoughts: * I'm inclined not to commit the conversion scripts to the repo. I doubt there are third parties out there with a need for them, and if they do need them they can get 'em out of this thread in the mailing list archives. (If anyone has a different opinion about that, speak up!) * I notice you have a few "preliminary cleanup" changes here and there in the series, such as fixing the inconsistent spelling of Anum_pg_init_privs_initprivs. These could be applied before we start the main conversion process, and I'm inclined to do that since we could get 'em out of the way early. Ideally, the main conversion would only touch the header files and related scripts/makefiles. * I'm a little disturbed by the fact that 0002 has to "re-doublequote values that are macros expanded by initdb.c". I see that there are only a small number of affected places, so maybe it's not really worth working harder, but I worry that something might get missed. Is there any way to include this consideration in the automated conversion, or at least to verify that we found all the places to quote? Or, seeing that 0004 seems to be introducing some quoting-related hacks to genbki.pl anyway, maybe we could take care of the issue there? * In 0003, I'd recommend leaving the re-indentation to happen in the next perltidy run (assuming perltidy would fix that, which I think is true but I might be wrong). It's just creating more review work to do it here. In any case, the patch summary line is pretty misleading since it's *not* just reindenting, but also refactoring genbki.pl. (BTW, if that refactoring would work on the script as it is, maybe that's another thing we could do early? The more we can do before "flag day", the better IMO.) * In 0006, I'm not very pleased with the introduction of "Makefile.headers". I'd keep those macros where they are in catalog/Makefile. backend/Makefile doesn't need to know about that, especially since it's doing an unconditional invocation of catalog/Makefile anyway. It could just do something like submake-schemapg: $(MAKE) -C catalog generated-headers and leave it to catalog/Makefile to know what needs to happen for both schemapg.h and the other generated files. Overall, though, this is looking pretty promising. regards, tom lane
Re: WIP: a way forward on bootstrap data
Thanks for taking a look. On 3/3/18, Tom Lane wrote: > John Naylor writes: >> Version 8, rebased against 76b6aa41f41d. > > I took a preliminary look through this, without yet attempting to execute > the script against HEAD. I have a few thoughts: > > * I'm inclined not to commit the conversion scripts to the repo. I doubt > there are third parties out there with a need for them, and if they do > need them they can get 'em out of this thread in the mailing list > archives. (If anyone has a different opinion about that, speak up!) If no one feels strongly otherwise, I'll just attach the conversion script along with the other patches for next version. To be clear, the rewrite script is intended be committed, both to enforce formatting and as a springboard for bulk editing. Now, whether that belongs in /src/include/catalog or /src/tools is debatable. > * I'm a little disturbed by the fact that 0002 has to "re-doublequote > values that are macros expanded by initdb.c". I see that there are only > a small number of affected places, so maybe it's not really worth working > harder, but I worry that something might get missed. Is there any way to > include this consideration in the automated conversion, or at least to > verify that we found all the places to quote? Or, seeing that 0004 seems > to be introducing some quoting-related hacks to genbki.pl anyway, maybe > we could take care of the issue there? The quoting hacks are really to keep the postgres.bki diff as small as possible (attached). The simplest and most air-tight way to address your concern would be to double-quote everything when writing the bki file. That could be done last as a follow-up. > * I notice you have a few "preliminary cleanup" changes here and there > in the series, such as fixing the inconsistent spelling of > Anum_pg_init_privs_initprivs. These could be applied before we start > the main conversion process, and I'm inclined to do that since we could > get 'em out of the way early. Ideally, the main conversion would only > touch the header files and related scripts/makefiles. ... > * In 0003, I'd recommend leaving the re-indentation to happen in the next > perltidy run (assuming perltidy would fix that, which I think is true but > I might be wrong). It's just creating more review work to do it here. > In any case, the patch summary line is pretty misleading since it's > *not* just reindenting, but also refactoring genbki.pl. (BTW, if that > refactoring would work on the script as it is, maybe that's another > thing we could do early? The more we can do before "flag day", the > better IMO.) I tested perltidy 20090616 and it handles it fine. I'll submit a preliminary patch soon to get some of those items out of the way. > * In 0006, I'm not very pleased with the introduction of > "Makefile.headers". I'd keep those macros where they are in > catalog/Makefile. backend/Makefile doesn't need to know about that, > especially since it's doing an unconditional invocation of > catalog/Makefile anyway. It could just do something like > > submake-schemapg: > $(MAKE) -C catalog generated-headers > > and leave it to catalog/Makefile to know what needs to happen for > both schemapg.h and the other generated files. I wasn't happy with it either, but I couldn't get it to build otherwise. The sticking point was the symlinks in $(builddir)/src/include/catalog. $(MAKE) -C catalog doesn't handle that. The makefile in /src/common relies on the backend makefile to know what to invoke for a given header. IIRC, relpath.c includes pg_tablespace.h, which now requires pg_tablespace_d.h to be built. Perhaps /src/common/Makefile could invoke the catalog makefile directly, and the pg_*_d.h headers could be written to $(builddir)/src/include/catalog directly? I'll hack on it some more. > Overall, though, this is looking pretty promising. > > regards, tom lane > Glad to hear it. -John Naylor --- postgres.bki.orig 2018-02-25 15:58:50.082261557 +0700 +++ postgres.bki.noquotes 2018-02-25 16:12:39.967498748 +0700 @@ -5651,9 +5651,9 @@ lanacl = aclitem[] ) open pg_language -insert OID = 12 ( "internal" 10 f f 0 0 2246 _null_ ) -insert OID = 13 ( "c" 10 f f 0 0 2247 _null_ ) -insert OID = 14 ( "sql" 10 f t 0 0 2248 _null_ ) +insert OID = 12 ( internal 10 f f 0 0 2246 _null_ ) +insert OID = 13 ( c 10 f f 0 0 2247 _null_ ) +insert OID = 14 ( sql 10 f t 0 0 2248 _null_ ) close pg_language create pg_largeobject_metadata 2995 ( @@ -5753,8 +5753,8 @@ insert ( 2798 n 0 2796 - 2796 - - - - - f f r r 2799 27 0 0 0 _null_ _null_ ) insert ( 3527 n 0 3524 - 3524 - - - - - f f r r 3518 3500 0 0 0 _null_ _null_ ) insert ( 3565 n 0 3563 - 3563 - - - - - f f r r 1203 869 0 0 0 _null_ _null_ ) -insert ( 2147 n 0 2804 - 463 - - 2804 3547 - f f r r 0 20 0 20 0 "0" "0" ) -insert ( 2803 n 0 1219 - 463 - - 1219 3546 - f f r r 0 20 0 20 0 "0" "0" ) +insert ( 2147 n 0 2804 - 463 - - 2804 3547 - f f r r 0 20 0 20 0 0 0 ) +in
Re: WIP: a way forward on bootstrap data
I wrote: > I'll submit a > preliminary patch soon to get some of those items out of the way. I've attached a patch that takes care of these cleanups so they don't clutter the patch set. -John Naylor From d97a8b2e5fa4977e656d1aca7ee7bf1289ecbd40 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sat, 3 Mar 2018 16:56:10 +0700 Subject: [PATCH] Some non-functional cleanups in preparation for the upcoming bootstrap data conversion: Arrange pg_tablespace.h OID symbols so they are immediately after the relevant DATA() line. Separate out the pg_attribute logic of genbki.pl into its own function and skip checking if the data is defined. This both narrows and shortens the data writing loop of the script. Correct spelling of some macros. --- src/backend/catalog/aclchk.c | 6 +- src/backend/catalog/genbki.pl | 272 ++ src/include/catalog/pg_init_privs.h | 2 +- src/include/catalog/pg_tablespace.h | 3 +- src/interfaces/ecpg/ecpglib/execute.c | 2 +- src/interfaces/ecpg/ecpglib/pg_type.h | 2 +- 6 files changed, 149 insertions(+), 138 deletions(-) diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 3f2c629..0648539 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -5969,8 +5969,8 @@ recordExtensionInitPrivWorker(Oid objoid, Oid classoid, int objsubid, Acl *new_a MemSet(nulls, false, sizeof(nulls)); MemSet(replace, false, sizeof(replace)); - values[Anum_pg_init_privs_privs - 1] = PointerGetDatum(new_acl); - replace[Anum_pg_init_privs_privs - 1] = true; + values[Anum_pg_init_privs_initprivs - 1] = PointerGetDatum(new_acl); + replace[Anum_pg_init_privs_initprivs - 1] = true; oldtuple = heap_modify_tuple(oldtuple, RelationGetDescr(relation), values, nulls, replace); @@ -6007,7 +6007,7 @@ recordExtensionInitPrivWorker(Oid objoid, Oid classoid, int objsubid, Acl *new_a values[Anum_pg_init_privs_privtype - 1] = CharGetDatum(INITPRIVS_EXTENSION); - values[Anum_pg_init_privs_privs - 1] = PointerGetDatum(new_acl); + values[Anum_pg_init_privs_initprivs - 1] = PointerGetDatum(new_acl); tuple = heap_form_tuple(RelationGetDescr(relation), values, nulls); diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index ed90a02..8d740c3 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -156,154 +156,85 @@ foreach my $catname (@{ $catalogs->{names} }) print $bki "open $catname\n"; } - if (defined $catalog->{data}) + # For pg_attribute.h, we generate data entries ourselves. + # NB: pg_type.h must come before pg_attribute.h in the input list + # of catalog names, since we use info from pg_type.h here. + if ($catname eq 'pg_attribute') { + gen_pg_attribute($schema); + } - # Ordinary catalog with DATA line(s) - foreach my $row (@{ $catalog->{data} }) - { - - # Split line into tokens without interpreting their meaning. - my %bki_values; - @bki_values{@attnames} = - Catalog::SplitDataLine($row->{bki_values}); - - # Perform required substitutions on fields - foreach my $column (@$schema) - { -my $attname = $column->{name}; -my $atttype = $column->{type}; - -# Substitute constant values we acquired above. -# (It's intentional that this can apply to parts of a field). -$bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g; -$bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g; - -# Replace regproc columns' values with OIDs. -# If we don't have a unique value to substitute, -# just do nothing (regprocin will complain). -if ($atttype eq 'regproc') -{ - my $procoid = $regprocoids{ $bki_values{$attname} }; - $bki_values{$attname} = $procoid - if defined($procoid) && $procoid ne 'MULTIPLE'; -} - } + # Ordinary catalog with DATA line(s) + foreach my $row (@{ $catalog->{data} }) + { - # Save pg_proc oids for use in later regproc substitutions. - # This relies on the order we process the files in! - if ($catname eq 'pg_proc') - { -if (defined($regprocoids{ $bki_values{proname} })) -{ - $regprocoids{ $bki_values{proname} } = 'MULTIPLE'; -} -else -{ - $regprocoids{ $bki_values{proname} } = $row->{oid}; -} - } + # Split line into tokens without interpreting their meaning. + my %bki_values; + @bki_values{@attnames} = + Catalog::SplitDataLine($row->{bki_values}); - # Save pg_type info for pg_attribute processing below - if ($catname eq 'pg_type') + # Perform required substitutions on fields + foreach my $column (@$schema) + { + my $attname = $column->{name}; + my $atttype = $column->{type}; + + # Substitute constant values we acquired above. + # (It's intentional that this can apply to parts of a field). + $bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g; + $bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g; + + # Replace
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 3/3/18, Tom Lane wrote: >> * I'm a little disturbed by the fact that 0002 has to "re-doublequote >> values that are macros expanded by initdb.c". I see that there are only >> a small number of affected places, so maybe it's not really worth working >> harder, but I worry that something might get missed. Is there any way to >> include this consideration in the automated conversion, or at least to >> verify that we found all the places to quote? Or, seeing that 0004 seems >> to be introducing some quoting-related hacks to genbki.pl anyway, maybe >> we could take care of the issue there? > The quoting hacks are really to keep the postgres.bki diff as small as > possible (attached). The simplest and most air-tight way to address > your concern would be to double-quote everything when writing the bki > file. That could be done last as a follow-up. Oh, if you're cross-checking by diff'ing the produced .bki file, then that's sufficient to address my concern here. No need to do more. regards, tom lane
Re: WIP: a way forward on bootstrap data
John Naylor writes: > I've attached a patch that takes care of these cleanups so they don't > clutter the patch set. Pushed. I made a couple of cosmetic changes in genbki.pl. regards, tom lane
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 3/3/18, Tom Lane wrote: >> * In 0006, I'm not very pleased with the introduction of >> "Makefile.headers". > I wasn't happy with it either, but I couldn't get it to build > otherwise. The sticking point was the symlinks in > $(builddir)/src/include/catalog. $(MAKE) -C catalog doesn't handle > that. The makefile in /src/common relies on the backend makefile to > know what to invoke for a given header. IIRC, relpath.c includes > pg_tablespace.h, which now requires pg_tablespace_d.h to be built. I'm not following. AFAICS, what you put in src/common/Makefile was just +.PHONY: generated-headers + +generated-headers: + $(MAKE) -C ../backend generated-headers which doesn't appear to care whether backend/Makefile knows anything about specific generated headers or not. I think all we need to do is consider that the *_d.h files ought to be built as another consequence of invoking the generated-headers target. BTW, there's already a submake-generated-headers target in Makefile.global, which you should use in preference to rolling your own. regards, tom lane
Re: WIP: a way forward on bootstrap data
John Naylor writes: > It didn't take that long to rebase the remaining parts of the > patchset, so despite what I said above I went ahead and put them in > version 10 (attached), this time via scripted bulk editing rather than > as large patches. Starting to look into this version now, but a small suggestion while it's still fresh in mind: it might be easier, in future rounds, to put all the files in a tarball and attach 'em as one big attachment. At least with my mail setup, it's way easier to save off a tarball and "tar xf" it than it is to individually save a dozen attachments. I suspect that way might be easier on your end, too. There's some value in posting a patchset as separate attachments when it's possible to just apply the patches in series; Munro's patch tester knows what to do with that, but not with a tarball AFAIK. But in this case, there's little hope that the patch tester would get it right anyhow. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 3/15/18, Tom Lane wrote: > John Naylor writes: >> It didn't take that long to rebase the remaining parts of the >> patchset, so despite what I said above I went ahead and put them in >> version 10 (attached), this time via scripted bulk editing rather than >> as large patches. > > Starting to look into this version now, but a small suggestion while > it's still fresh in mind: it might be easier, in future rounds, to > put all the files in a tarball and attach 'em as one big attachment. Sure thing. I've done so here for version 11, which is just a rebase over the removal of pg_class.relhaspkey. -John Naylor v11-bootstrap-data-conversion.tar.gz Description: GNU Zip compressed data
Re: WIP: a way forward on bootstrap data
John Naylor writes: > [ v11-bootstrap-data-conversion.tar.gz ] I've done a round of review work on this, focusing on the Makefile infrastructure. I found a bunch of problems with parallel builds and VPATH builds, which are addressed in the attached incremental patch. The parallel-build issues are a bit of a mess really: it's surprising we've not had problems there earlier. For instance, catalog/Makefile has postgres.description: postgres.bki ; postgres.shdescription: postgres.bki ; schemapg.h: postgres.bki ; However, genbki.pl doesn't make any particular guarantee that postgres.bki will be written sooner than its other output files, which means that in principle make might think it needs to rebuild these other files on every subsequent check. That was somewhat harmless given the empty update rule, but it's not really the right thing. Your patch extended this to make all the generated headers dependent on postgres.bki, and those are definitely written before postgres.bki, meaning make *will* think they are out of date. Worse, it'll also think the symlinks to them are out of date. So I was running into problems with different parallel make sub-tasks removing and recreating the symlinks. VPATH builds didn't work well either, because out-of-date-ness ties into whether make will accept a file in the source dir as a valid replacement target. I resolved this mess by setting up a couple of stamp files, which is a technique we also use elsewhere. bki-stamp is a single file representing the action of having run genbki.pl, and header-stamp likewise represents the action of having made the symlinks to the generated headers. By depending on those rather than individual files, we avoid questions of exactly what the timestamps on the individual output files are. In the attached, I've also done some more work on the README file and cleaned up a few other little things. I've not really looked at the MSVC build code at all. Personally, I'm willing to just commit this (when the time comes) and let the buildfarm see if the MSVC code works ... but if anyone else wants to check that part beforehand, please do. I also have not spent much time yet looking at the end-product .h and .dat files. I did note a bit of distressing inconsistency in the formatting of the catalog struct declarations, some of which predates this patch but it seems like you've introduced more. I think what we ought to standardize on is a format similar to this in pg_opclass.h: CATALOG(pg_opclass,2616) { /* index access method opclass is for */ Oid opcmethod BKI_LOOKUP(pg_am); /* name of this opclass */ NameDataopcname; /* namespace of this opclass */ Oid opcnamespaceBKI_DEFAULT(PGNSP); /* opclass owner */ Oid opcownerBKI_DEFAULT(PGUID); The former convention used in some places, of field descriptions in same-line comments, clearly won't work anymore if we're sticking BKI_DEFAULT annotations there. I also don't like the format used in, eg, pg_aggregate.h of putting field descriptions in a separate comment block before the struct proper. Bitter experience has shown that there are a lot of people on this project who won't update comments that are more than about two lines away from the code they change; so the style in pg_aggregate.h is just inviting maintenance oversights. I've got mixed feelings about the whitespace lines between fields. They seem like they are mostly bulking up the code and we could do without 'em. On the other hand, pgindent will insist on putting one before any multi-line field comment, and so that would create inconsistent formatting if we don't use 'em elsewhere. Thoughts? Speaking of pgindent, those prettily aligned BKI annotations are a waste of effort, because when pgindent gets done with the code it will look like regproc aggfnoid; charaggkind BKI_DEFAULT(n); int16 aggnumdirectargs BKI_DEFAULT(0); regproc aggtransfn BKI_LOOKUP(pg_proc); regproc aggfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); regproc aggcombinefn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); regproc aggserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); regproc aggdeserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); I'm not sure if there's anything much to be done about this. BKI_DEFAULT isn't so bad, but additional annotations get unreadable fast. Maybe BKI_LOOKUP was a bad idea after all, and we should just invent more Oid-equivalent typedef names. The attached is just one incremental patch on top of your v11 series. I couldn't think of an easy way to migrate the changes back into the most relevant diffs of your series, so I didn'
Re: WIP: a way forward on bootstrap data
On 3/21/18, Tom Lane wrote: > John Naylor writes: >> [ v11-bootstrap-data-conversion.tar.gz ] > > I've done a round of review work on this, focusing on the Makefile > infrastructure. I found a bunch of problems with parallel builds and > VPATH builds, which are addressed in the attached incremental patch. > [explanation of Make issues and stamp files] > In the attached, I've also done some more work on the README file > and cleaned up a few other little things. Thanks for pulling my attempt at makefile hackery across the finish line. It sounds like there are no more obvious structural issues remaining (fingers crossed). Your other improvements make sense. > I did note a bit of distressing inconsistency in the formatting of > the catalog struct declarations, some of which predates this patch but it > seems like you've introduced more. I think what we ought to standardize > on is a format similar to this in pg_opclass.h: > > CATALOG(pg_opclass,2616) > { > /* index access method opclass is for */ > Oid opcmethod BKI_LOOKUP(pg_am); > [snip] That is the most sensible format. Did you mean all 62 catalog headers for future-proofing, or just the ones with annotations now? > The former convention used in some places, of field descriptions in > same-line comments, clearly won't work anymore if we're sticking > BKI_DEFAULT annotations there. Yeah. > I also don't like the format used in, eg, > pg_aggregate.h of putting field descriptions in a separate comment block > before the struct proper. Bitter experience has shown that there are a > lot of people on this project who won't update comments that are more than > about two lines away from the code they change; so the style in > pg_aggregate.h is just inviting maintenance oversights. Okay. > I've got mixed feelings about the whitespace lines between fields. They > seem like they are mostly bulking up the code and we could do without 'em. > On the other hand, pgindent will insist on putting one before any > multi-line field comment, and so that would create inconsistent formatting > if we don't use 'em elsewhere. Thoughts? I'll do it both ways for one header and post the results for people to look at. > Speaking of pgindent, those prettily aligned BKI annotations are a waste > of effort, because when pgindent gets done with the code it will look > like > [snip] > regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); > regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); > regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); > I'm not sure if there's anything much to be done about this. BKI_DEFAULT > isn't so bad, but additional annotations get unreadable fast. Maybe > BKI_LOOKUP was a bad idea after all, and we should just invent more > Oid-equivalent typedef names. Well, until version 7, I used "fake" type aliases that only genbki.pl could see. The C compiler and postgres.bki could only see the actual Oid/oid type. Perhaps it was a mistake to model their appearance after regproc (regtype, etc), because that was misleading. Maybe something more obviously transient like 'lookup_typeoid? I'm leaning towards this idea. Another possibility is to teach the pgindent pre_/post_indent functions to preserve annotation formatting, but I'd rather not add yet another regex to that script. Plus, over the next 10+ years, I could see people adding several more BKI_* macros, leading to readability issues regardless of formatting, so maybe we should nip this one in the bud. > The attached is just one incremental patch on top of your v11 series. > I couldn't think of an easy way to migrate the changes back into the > most relevant diffs of your series, so I didn't try. I've done that quite a few times while developing this patch series, so I'm used to it. I'll incorporate your changes soon and also rebase over the new pg_class column that landed recently. I'll have a new version by this weekend, assuming we conclude the formatting discussion, so if you or others have any more comments by then, I'll include them. -John Naylor
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 3/21/18, Tom Lane wrote: >> regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); >> regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); >> regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); >> I'm not sure if there's anything much to be done about this. BKI_DEFAULT >> isn't so bad, but additional annotations get unreadable fast. Maybe >> BKI_LOOKUP was a bad idea after all, and we should just invent more >> Oid-equivalent typedef names. > Well, until version 7, I used "fake" type aliases that only genbki.pl > could see. The C compiler and postgres.bki could only see the actual > Oid/oid type. Perhaps it was a mistake to model their appearance after > regproc (regtype, etc), because that was misleading. Maybe something > more obviously transient like 'lookup_typeoid? I'm leaning towards > this idea. Looking at this again, I think a big chunk of the readability problem here is just from the fact that we have long, similar-looking lines tightly packed. If it were reformatted to have comment lines and whitespace between, it might not look nearly as bad. > Another possibility is to teach the pgindent pre_/post_indent > functions to preserve annotation formatting, but I'd rather not add > yet another regex to that script. Plus, over the next 10+ years, I > could see people adding several more BKI_* macros, leading to > readability issues regardless of formatting, so maybe we should nip > this one in the bud. Well, whether or not we invent BKI_LOOKUP, the need for other kinds of annotations isn't likely to be lessened. I wondered whether we could somehow convert the format into multiple lines, say regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); but some quick experimentation was discouraging: either Emacs' C syntax mode, or pgindent, or both would make a hash of it. It wasn't great from the vertical-space-consumption standpoint either. Anyway, for the moment I'd stick with BKI_LOOKUP rather than undoing that work. I think it's a more transparent way of saying what we want than the magic-OID-typedefs approach. The formatting issue is just a mild annoyance, and it's not really BKI_LOOKUP's fault anyway. While I'm thinking of it --- I noticed one or two places where you had "BKI_DEFAULT(\0)". That coding scares me a bit --- gcc seems to tolerate it, but other C compilers might feel that \0 is not a valid preprocessing token, or it might confuse some editors' syntax highlight rules. I'd rather write cases like this as "BKI_DEFAULT('\0')". IOW, the argument should be a valid C identifier, number, or string literal. regards, tom lane
Re: WIP: a way forward on bootstrap data
I wrote: > On 3/21/18, Tom Lane wrote: >> I've got mixed feelings about the whitespace lines between fields. They >> seem like they are mostly bulking up the code and we could do without >> 'em. >> On the other hand, pgindent will insist on putting one before any >> multi-line field comment, and so that would create inconsistent >> formatting >> if we don't use 'em elsewhere. Thoughts? > > I'll do it both ways for one header and post the results for people to look > at. I've attached an earlier version of pg_proc.h with both formats as I understand them. I turned a couple comments into multi-line comments to demonstrate. I think without spaces it's just as hard to read as with multiple annotations. I'd vote for spaces, but then again I'm not the one who has to read these things very often. -John Naylor /*- * * pg_proc.h * definition of the system "procedure" relation (pg_proc) * along with the relation's initial contents. * * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * src/include/catalog/pg_proc.h * * NOTES * The Catalog.pm module reads this file and derives schema * information. * *- */ #ifndef PG_PROC_H #define PG_PROC_H #include "catalog/genbki.h" /* * pg_proc definition. cpp turns this into * typedef struct FormData_pg_proc * */ #define ProcedureRelationId 1255 #define ProcedureRelation_Rowtype_Id 81 CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO { /* procedure name */ NameData proname; /* OID of namespace containing this proc */ Oid pronamespace BKI_DEFAULT(PGNSP); /* procedure owner */ Oid proowner BKI_DEFAULT(PGUID); /* OID of pg_language entry */ Oid prolang BKI_DEFAULT(12); /* estimated execution cost */ float4 procost BKI_DEFAULT(1); /* estimated # of * rows out (if proretset) */ float4 prorows BKI_DEFAULT(0); /* element type of variadic array, or 0 */ Oid provariadic BKI_DEFAULT(0); /* transforms calls to it during planning */ regproc protransform BKI_DEFAULT(0); /* is it an aggregate? */ bool proisagg BKI_DEFAULT(f); /* is it a window function? */ bool proiswindow BKI_DEFAULT(f); /* security definer */ bool prosecdef BKI_DEFAULT(f); /* is it a leak-proof function? */ bool proleakproof BKI_DEFAULT(f); /* strict with respect to NULLs? */ bool proisstrict BKI_DEFAULT(f); /* returns a set? */ bool proretset BKI_DEFAULT(f); /* see PROVOLATILE_ categories below */ char provolatile BKI_DEFAULT(v); /* see PROPARALLEL_ categories below */ char proparallel BKI_DEFAULT(u); /* number of arguments */ int16 pronargs; /* number of arguments with defaults */ int16 pronargdefaults BKI_DEFAULT(0); /* OID of result type */ Oid prorettype; /* * variable-length fields start here, but we allow direct access to * proargtypes */ /* parameter types (excludes OUT params) */ oidvector proargtypes; #ifdef CATALOG_VARLEN /* all param types (NULL if IN only) */ Oid proallargtypes[1] BKI_DEFAULT(_null_); /* parameter modes (NULL if IN only) */ char proargmodes[1] BKI_DEFAULT(_null_); /* parameter names (NULL if no names) */ text proargnames[1] BKI_DEFAULT(_null_); /* list of expression trees * for argument defaults (NULL if none) */ pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* types for which to apply transforms */ Oid protrftypes[1] BKI_DEFAULT(_null_); /* procedure source text */ text prosrc BKI_FORCE_NOT_NULL; /* secondary procedure info (can be NULL) */ text probin BKI_DEFAULT(_null_); /* procedure-local GUC settings */ text proconfig[1] BKI_DEFAULT(_null_); /* access permissions */ aclitem proacl[1] BKI_DEFAULT(_null_); #endif } FormData_pg_proc; /* * Form_pg_proc corresponds to a pointer to a tuple with * the format of pg_proc relation. * */ typedef FormData_pg_proc *Form_pg_proc; /* * compiler constants for pg_proc * */ #define Natts_pg_proc 29 #define Anum_pg_proc_proname 1 #define Anum_pg_proc_pronamespace 2 #define Anum_pg_proc_proowner 3 #define Anum_pg_proc_prolang 4 #define Anum_pg_proc_procost 5 #define Anum_pg_proc_prorows 6 #define Anum_pg_proc_provariadic 7 #define Anum_pg_proc_protransform 8 #define Anum_pg_proc_proisagg 9 #define Anum_pg_proc_proiswindow 10 #define Anum_pg_proc_prosecdef 11 #define Anum_pg_proc_proleakproof 12 #define Anum_pg_proc_proisstrict 13 #define Anum_pg_proc_proretset 14 #define Anum_pg_proc_provolatile 15 #define Anum_pg_proc_proparallel 16 #define Anum_pg_proc_pronargs 17 #define Anum_pg_proc_pronargdefaults 18 #define Anum_pg_proc_prorettype 19 #define Anum_pg_proc_proargtypes 20 #define Anum_pg_proc_proallargty
Re: WIP: a way forward on bootstrap data
On 3/22/18, Tom Lane wrote: > Looking at this again, I think a big chunk of the readability problem here > is just from the fact that we have long, similar-looking lines tightly > packed. If it were reformatted to have comment lines and whitespace > between, it might not look nearly as bad. > ... > Anyway, for the moment I'd stick with BKI_LOOKUP rather than undoing > that work. I think it's a more transparent way of saying what we > want than the magic-OID-typedefs approach. The formatting issue is > just a mild annoyance, and it's not really BKI_LOOKUP's fault anyway. Okay, I'll do it with comments and whitespace. > While I'm thinking of it --- I noticed one or two places where you > had "BKI_DEFAULT(\0)". That coding scares me a bit --- gcc seems to > tolerate it, but other C compilers might feel that \0 is not a valid > preprocessing token, or it might confuse some editors' syntax highlight > rules. I'd rather write cases like this as "BKI_DEFAULT('\0')". IOW, > the argument should be a valid C identifier, number, or string literal. Hmm, I only see this octal in pg_type.h chartypdelim BKI_DEFAULT(\054); Which I hope is fine. Were you thinking of this comment in pg_attribute.h? We use the double-quoted empty string for postgres.bki and change it to '\0' for schemapg.h. /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */ charattidentity BKI_DEFAULT(""); -John Naylor
Re: WIP: a way forward on bootstrap data
John Naylor wrote: > I've attached an earlier version of pg_proc.h with both formats as I > understand them. I turned a couple comments into multi-line comments > to demonstrate. I think without spaces it's just as hard to read as > with multiple annotations. I'd vote for spaces, but then again I'm not > the one who has to read these things very often. how about letting the line go long, with the comment at the right of each definition, with one blank line between struct members, as in the sample below? You normally don't care that these lines are too long since you seldom edit them -- one mostly adds or remove entire lines instead, so there's not as much need for side-by-side diffs as with regular code. (One issue with this proposal is how to convince pgindent to leave the long lines alone.) To me, an important property of these structs is fitting as much as possible (vertically) in a screenful; the other proposed modes end up with too many lines. CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO { NameDataproname;/* procedure name */ Oid pronamespace BKI_DEFAULT(PGNSP); /* OID of namespace containing this proc */ Oid proowner BKI_DEFAULT(PGUID); /* procedure owner */ Oid prolang BKI_DEFAULT(12); /* OID of pg_language entry */ float4 procost BKI_DEFAULT(1); /* estimated execution cost */ float4 prorows BKI_DEFAULT(0); /* estimated # of rows out (if proretset) */ Oid provariadic BKI_DEFAULT(0); /* element type of variadic array, or 0 */ regproc protransform BKI_DEFAULT(0); /* transforms calls to it during planning */ boolproisagg BKI_DEFAULT(f); /* is it an aggregate? */ boolproiswindow BKI_DEFAULT(f); /* is it a window function? */ boolprosecdef BKI_DEFAULT(f); /* security definer */ boolproleakproof BKI_DEFAULT(f); /* is it a leak-proof function? */ boolproisstrict BKI_DEFAULT(f); /* strict with respect to NULLs? */ boolproretset BKI_DEFAULT(f); /* returns a set? */ charprovolatile BKI_DEFAULT(v); /* see PROVOLATILE_ categories below */ charproparallel BKI_DEFAULT(u); /* see PROPARALLEL_ categories below */ int16 pronargs; /* number of arguments */ int16 pronargdefaults BKI_DEFAULT(0); /* number of arguments with defaults */ Oid prorettype; /* OID of result type */ /* * variable-length fields start here, but we allow direct access to * proargtypes */ oidvectorproargtypes; /* parameter types (excludes OUT params) */ #ifdef CATALOG_VARLEN Oid proallargtypes[1] BKI_DEFAULT(_null_); /* all param types (NULL if IN only) */ charproargmodes[1] BKI_DEFAULT(_null_); /* parameter modes (NULL if IN only) */ textproargnames[1] BKI_DEFAULT(_null_); /* parameter names (NULL if no names) */ pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* list of expression trees for argument defaults (NULL if none) */ Oid protrftypes[1] BKI_DEFAULT(_null_); /* types for which to apply transforms */ textprosrc BKI_FORCE_NOT_NULL; /* procedure source text */ textprobin BKI_DEFAULT(_null_); /* secondary procedure info (can be NULL) */ textproconfig[1] BKI_DEFAULT(_null_); /* procedure-local GUC settings */ aclitem proacl[1] BKI_DEFAULT(_null_); /* access permissions */ #endif } FormData_pg_proc; -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 3/22/18, Tom Lane wrote: >> While I'm thinking of it --- I noticed one or two places where you >> had "BKI_DEFAULT(\0)". > Hmm, I only see this octal in pg_type.h > chartypdelim BKI_DEFAULT(\054); Sorry, I was going by memory rather than looking at the code. > Which I hope is fine. I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054'). > Were you thinking of this comment in > pg_attribute.h? We use the double-quoted empty string for postgres.bki > and change it to '\0' for schemapg.h. > /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */ > char attidentity BKI_DEFAULT(""); That definitely seems like a hack --- why not BKI_DEFAULT('\0') ? regards, tom lane
Re: WIP: a way forward on bootstrap data
John Naylor writes: >> On 3/21/18, Tom Lane wrote: >>> I've got mixed feelings about the whitespace lines between fields. > I've attached an earlier version of pg_proc.h with both formats as I > understand them. I turned a couple comments into multi-line comments > to demonstrate. I think without spaces it's just as hard to read as > with multiple annotations. I'd vote for spaces, but then again I'm not > the one who has to read these things very often. Yeah, after looking at this example I agree --- it's too tight without the blank lines. regards, tom lane
Re: WIP: a way forward on bootstrap data
Alvaro Herrera writes: > how about letting the line go long, with the comment at the right of > each definition, with one blank line between struct members, as in the > sample below? > NameDataproname;/* procedure name */ > Oid pronamespace BKI_DEFAULT(PGNSP); /* OID of namespace > containing this proc */ > Oid proowner BKI_DEFAULT(PGUID); /* procedure owner */ I don't think this is going to work: pgindent is going to wrap most of these comments, ending up with something that's ugly *and* consumes just as much vertical space as if we'd given the comments their own lines. The problem is that in the headers where we were using same-line comments, the comments were written to fit in the space available without this extra annotation. (For my money, having spent lots of time shaving a character or two off such comments to make 'em fit, I'd much prefer the luxury of having a whole line to write in.) We could go with some scheme that preserves the old formatting of the struct definition proper and puts the added info somewhere else, ie Oid pronamespace; /* OID of namespace containing this proc */ Oid prolang; /* OID of pg_language entry */ then after the struct: BKI_DEFAULT(pronamespace, PGNSP); BKI_DEFAULT(prolang, 12); but on the whole I don't think that's an improvement. I'd rather keep the info about a field together. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 3/22/18, Alvaro Herrera wrote: > how about letting the line go long, with the comment at the right of > each definition, with one blank line between struct members, as in the > sample below? You normally don't care that these lines are too long > since you seldom edit them -- one mostly adds or remove entire lines > instead, so there's not as much need for side-by-side diffs as with > regular code. (One issue with this proposal is how to convince pgindent > to leave the long lines alone.) Yeah, it seems when perltidy or pgindent mangle things badly, it's to try and shoehorn a long line into a smaller number of characters. If memory serves, I've come across things like this: pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* list of expression trees for argument defaults (NULL if none) */ And thought "only a machine could be so precisely awkward" -John Naylor
Re: WIP: a way forward on bootstrap data
On 3/22/18, Tom Lane wrote: > I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054'). Okay, I'll do that. >> Were you thinking of this comment in >> pg_attribute.h? We use the double-quoted empty string for postgres.bki >> and change it to '\0' for schemapg.h. > >> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */ >> char attidentity BKI_DEFAULT(""); > > That definitely seems like a hack --- why not BKI_DEFAULT('\0') ? Hmm, yes, the way I had it, the comment is a mystery. I'll switch it around. -John Naylor
Re: WIP: a way forward on bootstrap data
On 3/21/18, Tom Lane wrote: > The attached is just one incremental patch on top of your v11 series. > I couldn't think of an easy way to migrate the changes back into the > most relevant diffs of your series, so I didn't try. I've applied your changes to the v12 patch series (attached), but I hope you'll allow two nit-picky adjustments: -s/pg_XXX.h/pg_xxx.h/ in the README. There seems to be greater precedent for the lower-case spelling if the rest of the word is lower case. -I shortened the data example in the README so it would comfortably fit on two lines. Spreading it out over three lines doesn't match what's in the data files. It's valid syntax, but real data is formatted to at most two lines (See rewrite_dat.pl. Hmm, maybe I should make that more explicit elsewhere in the README) > I also have not spent much time yet looking at the end-product .h and .dat > files. I did note a bit of distressing inconsistency in the formatting of > the catalog struct declarations, some of which predates this patch but it > seems like you've introduced more. I think what we ought to standardize > on is a format similar to this in pg_opclass.h: > > CATALOG(pg_opclass,2616) > { > /* index access method opclass is for */ > Oid opcmethod BKI_LOOKUP(pg_am); > Done, with blank lines interspersed. I put most changes of this sort in with the other cleanups in patch 0004. I neglected to do this separately for couple of tiny tables that have lookups, but no default values. I don't think it impacts the readability of patch 0007 much. On 3/22/18, Tom Lane wrote: > I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054'). [snip] >> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */ >> char attidentity BKI_DEFAULT(""); > > That definitely seems like a hack --- why not BKI_DEFAULT('\0') ? Done (patch 0006). Other changes: -A README note about OID macros (patch 0007). -A couple minor cosmetic rearrangements and comment/commit message edits. Open items: -Test MSVC. -Arrange for rewrite_dat.pl to run when perltidy does. -I was a bit cavalier about when to use =/:= in the Makefiles. Not sure if there's a preferred project style for when the choice doesn't really matter. -Maybe document examples of how to do bulk-editing of data files? -John Naylor v12-bootstrap-data-conversion.tar.gz Description: GNU Zip compressed data
Re: WIP: a way forward on bootstrap data
John Naylor writes: > -I shortened the data example in the README so it would comfortably > fit on two lines. Spreading it out over three lines doesn't match > what's in the data files. It's valid syntax, but real data is > formatted to at most two lines (See rewrite_dat.pl. Hmm, maybe I > should make that more explicit elsewhere in the README) Well, as I said, I hadn't really reviewed the .dat files, but if that's what you're doing I'm going to request a change. Project style is to fit in 80 columns as much as possible. I do not see a reason to exempt the .dat files from that, especially not since it would presumably be a trivial change in rewrite_dat.pl to insert extra newlines between fields when needed. (Obviously, if a field value is so wide it runs past 80 columns on its own, it's not rewrite_dat.pl's charter to fix that.) > Open items: > -Test MSVC. Again, while I'd be happy if someone did that manually, I'm prepared to let the buildfarm do it. > -Arrange for rewrite_dat.pl to run when perltidy does. What I was thinking we should have is a convenience target in include/Makefile to do this, say "make reformat-dat-files". I'm not that excited about bundling it into pgindent runs. > -Maybe document examples of how to do bulk-editing of data files? +1. In the end, that's the reason we're doing all this work, so showing people how to benefit seems like a good thing. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 3/25/18, Tom Lane wrote: > Well, as I said, I hadn't really reviewed the .dat files, but if that's > what you're doing I'm going to request a change. Project style is to > fit in 80 columns as much as possible. I do not see a reason to exempt > the .dat files from that, especially not since it would presumably be a > trivial change in rewrite_dat.pl to insert extra newlines between fields > when needed. (Obviously, if a field value is so wide it runs past 80 > columns on its own, it's not rewrite_dat.pl's charter to fix that.) This feature is working now. I've attached a 100-line sample of all the catalogs' files for viewing. Note, this is pretty raw output, without the clean-up step from patch 0002. In the most of the original DATA() lines, there was no spacing between entries except in some cases to separate groups (often with a comment to describe the group). My clean-up patch tried to make that more consistent. For this sample, it would add blank lines before the comments in pg_amop, and remove blank lines from the first few entries in pg_type. If you wanted to opine on that before I rework that patch, I'd be grateful. Also, these data entries have default values removed, but they don't have human-readable OID macros. (I'll have to adjust that script to the 80-column limit as well). >> -Arrange for rewrite_dat.pl to run when perltidy does. > > What I was thinking we should have is a convenience target in > include/Makefile to do this, say "make reformat-dat-files". > I'm not that excited about bundling it into pgindent runs. I've attached a draft patch for this. If it's okay, I'll incorporate it into the series. I think reformat_dat_files.pl also works as a better script name. >> -Maybe document examples of how to do bulk-editing of data files? > > +1. In the end, that's the reason we're doing all this work, so showing > people how to benefit seems like a good thing. It seems like with that, it'd be good to split off the data-format section of the README into a new file, maybe README.data, which will contain code snippets and some example scenarios. I'll include the example pg_proc.prokind merger among those. -John Naylor
Re: WIP: a way forward on bootstrap data
With the attachments this time. -John Naylor #-- # # pg_aggregate.dat #Initial contents of the pg_aggregate system relation. # # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group # Portions Copyright (c) 1994, Regents of the University of California # # src/include/catalog/pg_aggregate.dat # #-- [ # avg { aggfnoid => '2100', aggtransfn => 'int8_avg_accum', aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine', aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv', aggmfinalfn => 'numeric_poly_avg', aggtranstype => '2281', aggtransspace => '48', aggmtranstype => '2281', aggmtransspace => '48' }, { aggfnoid => '2101', aggtransfn => 'int4_avg_accum', aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv', aggmfinalfn => 'int8_avg', aggtranstype => '1016', aggmtranstype => '1016', agginitval => '{0,0}', aggminitval => '{0,0}' }, { aggfnoid => '2102', aggtransfn => 'int2_avg_accum', aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv', aggmfinalfn => 'int8_avg', aggtranstype => '1016', aggmtranstype => '1016', agginitval => '{0,0}', aggminitval => '{0,0}' }, { aggfnoid => '2103', aggtransfn => 'numeric_avg_accum', aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine', aggserialfn => 'numeric_avg_serialize', aggdeserialfn => 'numeric_avg_deserialize', aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv', aggmfinalfn => 'numeric_avg', aggtranstype => '2281', aggtransspace => '128', aggmtranstype => '2281', aggmtransspace => '128' }, { aggfnoid => '2104', aggtransfn => 'float4_accum', aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine', aggtranstype => '1022', agginitval => '{0,0,0}' }, { aggfnoid => '2105', aggtransfn => 'float8_accum', aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine', aggtranstype => '1022', agginitval => '{0,0,0}' }, { aggfnoid => '2106', aggtransfn => 'interval_accum', aggfinalfn => 'interval_avg', aggcombinefn => 'interval_combine', aggmtransfn => 'interval_accum', aggminvtransfn => 'interval_accum_inv', aggmfinalfn => 'interval_avg', aggtranstype => '1187', aggmtranstype => '1187', agginitval => '{0 second,0 second}', aggminitval => '{0 second,0 second}' }, # sum { aggfnoid => '2107', aggtransfn => 'int8_avg_accum', aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine', aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv', aggmfinalfn => 'numeric_poly_sum', aggtranstype => '2281', aggtransspace => '48', aggmtranstype => '2281', aggmtransspace => '48' }, { aggfnoid => '2108', aggtransfn => 'int4_sum', aggcombinefn => 'int8pl', aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv', aggmfinalfn => 'int2int4_sum', aggtranstype => '20', aggmtranstype => '1016', aggminitval => '{0,0}' }, { aggfnoid => '2109', aggtransfn => 'int2_sum', aggcombinefn => 'int8pl', aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv', aggmfinalfn => 'int2int4_sum', aggtranstype => '20', aggmtranstype => '1016', aggminitval => '{0,0}' }, { aggfnoid => '2110', aggtransfn => 'float4pl', aggcombinefn => 'float4pl', aggtranstype => '700' }, { aggfnoid => '2111', aggtransfn => 'float8pl', aggcombinefn => 'float8pl', aggtranstype => '701' }, { aggfnoid => '2112', aggtransfn => 'cash_pl', aggcombinefn => 'cash_pl', aggmtransfn => 'cash_pl', aggminvtransfn => 'cash_mi', aggtranstype => '790', aggmtranstype => '790' }, { aggfnoid => '2113', aggtransfn => 'interval_pl', aggcombinefn => 'interval_pl', aggmtransfn => 'interval_pl', aggminvtransfn => 'interval_mi', aggtranstype => '1186', aggmtranstype => '1186' }, { aggfnoid => '2114', aggtransfn => 'numeric_avg_accum', aggfinalfn => 'numeric_sum', aggcombinefn => 'numeric_avg_combine', aggserialfn => 'numeric_avg_serialize', aggdeserialfn => 'numeric_avg_deserialize', aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv', aggmfinalfn => 'numeric_sum', aggtranstype => '2281', aggtransspace => '128', aggmtranstype => '2281', aggmtransspace => '128' }, # max { aggfnoid => '2115', aggtransfn => 'int8larger', aggcombinefn => 'int8larger', aggsortop => '413', aggtranstype => '20' }, { aggfnoid => '2116', aggtransfn => 'int4larger', aggcombinefn => 'int4larger', aggsortop => '521', aggtranstype => '23' }, { aggfnoid => '2117', aggtransfn => 'int2larger', aggcombinefn => 'int2larger', aggsortop => '520', aggtranstype => '21' }, { agg
Re: WIP: a way forward on bootstrap data
John Naylor writes: > With the attachments this time. Layout of .dat files seems generally reasonable, but I don't understand the proposed make rule: +reformat-dat-files: + $(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog catalog/pg_*.dat This rule has no prerequisite, so what's $< supposed to be? Also, I think the rule probably ought to be located in src/include/catalog/Makefile, because that's typically where you'd be cd'd to when messing with the .dat files, I'd think. (Hm, I see no such makefile, but maybe it's time for one. A convenience rule located one level up doesn't seem very convenient.) > My clean-up patch tried to make that more consistent. For this sample, > it would add blank lines before the comments in pg_amop, and remove > blank lines from the first few entries in pg_type. If you wanted to > opine on that before I rework that patch, I'd be grateful. No particular objection to either. >>> -Maybe document examples of how to do bulk-editing of data files? >> +1. In the end, that's the reason we're doing all this work, so showing >> people how to benefit seems like a good thing. > It seems like with that, it'd be good to split off the data-format > section of the README into a new file, maybe README.data, which will > contain code snippets and some example scenarios. I'll include the > example pg_proc.prokind merger among those. It would be more work, but maybe we should move this into the main SGML docs. It seems rather silly to have SGML documentation for the .BKI file format, which now will be an internal matter that hardly any developers need worry about, but not for the .DAT file format. But I understand if that seems a bridge too far for today --- certainly a README file is way better than nothing. regards, tom lane
Re: WIP: a way forward on bootstrap data
> On Mar 26, 2018, at 10:44 PM, Tom Lane wrote > Layout of .dat files seems generally reasonable, but I don't understand > the proposed make rule: > > +reformat-dat-files: > +$(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog > catalog/pg_*.dat > > This rule has no prerequisite, so what's $< supposed to be? Also, I think > the rule probably ought to be located in src/include/catalog/Makefile, > because that's typically where you'd be cd'd to when messing with the > .dat files, I'd think. (Hm, I see no such makefile, but maybe it's time > for one. A convenience rule located one level up doesn't seem very > convenient.) > Oops, copy-pasto. And I’ll see about a new Makefile. >> It seems like with that, it'd be good to split off the data-format >> section of the README into a new file, maybe README.data, which will >> contain code snippets and some example scenarios. I'll include the >> example pg_proc.prokind merger among those. > > It would be more work, but maybe we should move this into the main > SGML docs. It seems rather silly to have SGML documentation for the > .BKI file format, which now will be an internal matter that hardly > any developers need worry about, but not for the .DAT file format. > But I understand if that seems a bridge too far for today --- certainly > a README file is way better than nothing. Makes sense on all points. I’m not optimistic about creating a new sgml doc on time, but I’ll keep it in mind. -John Naylor
Re: WIP: a way forward on bootstrap data
Tom Lane wrote: >> -Maybe document examples of how to do bulk-editing of data files? > > +1. In the end, that's the reason we're doing all this work, so showing > people how to benefit seems like a good thing. I'll hold off on posting a new patchset until I add this to the documentation, but I wanted to report on a couple of other things: While adjusting to the 80-column limit, I encountered a separation of concerns violation between Catalog.pm and reformat_dat_files.pl that I hadn't noticed before. Fixing that made things easier to read, with fewer lines of code. Speaking of bulk editing, that would be done via adopting reformat_dat_files.pl to the task at hand. I did this myself for two of the conversion helper scripts. However, enough bitrot has now occurred that to make the relationship murky. Since I had to adopt them to the 80-column limit as well, I shaved all the irrelevant differences away, and now they're just a small diff away from the reformat script. I also added block comments to help developers find where they need to edit the script. Since reformat_dat_files.pl has been substantially altered, I'll attach it here, along with the diffs to the the helper scripts. I wrote: > I’ll see about a new Makefile. I've attached a draft of this. I thought about adding a call to duplicate_oids here, but this won't run unless you've run configure first, and if you've done that, you've likely built already, running duplicate_oids in the process. I think I'll consolidate all documentation patches into one, at the end of the series for maximum flexibility. I liked the idea of spreading the doc changes over the patches, but there is not a huge amount of time left. -John Naylor --- /home/john/pgdev/postgresql/src/include/catalog/reformat_dat_files.pl 2018-03-27 18:04:54.698464144 +0700 +++ remove_pg_type_oid_symbols.pl 2018-03-27 18:13:42.270611897 +0700 @@ -1,18 +1,12 @@ #!/usr/bin/perl -w #-- # -# reformat_dat_files.pl -#Perl script that reads in a catalog data file and writes out -#a functionally equivalent file in a standard format. -# -#Metadata entries (if any) come first, with normal attributes -#starting on the following line, in the same order they would be in -#the actual table. +# remove_pg_type_oid_symbols.pl # # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group # Portions Copyright (c) 1994, Regents of the University of California # -# /src/include/catalog/reformat_dat_files.pl +# /src/include/catalog/remove_pg_type_oid_symbols.pl # #-- @@ -85,22 +79,6 @@ $catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 1); } - -# At this point, we have read all the data. If you are modifying this -# script for bulk editing, this is a good place to build lookup tables, -# if you need to. In the following example, the "next if !ref $row" -# check below is a hack to filter out non-hash objects. This is because -# we build the lookup tables from data that we read using the -# "preserve_formatting" parameter. -# -##Index access method lookup. -#my %amnames; -#foreach my $row (@{ $catalog_data{pg_am} }) -#{ -# next if !ref $row; -# $amnames{$row->{oid}} = $row->{amname}; -#} - # Write the data. foreach my $catname (@catnames) @@ -131,10 +109,15 @@ my %values = %$data; - # At this point we have the full tuple in memory as a hash - # and can do any operations we want. As written, it only - # removes default values, but this script can be adopted to - # do one-off bulk-editing. + # Remove pg_type OID symbols if they can match the rule + # we use to generate them. + if ($catname eq 'pg_type' and exists $values{oid_symbol}) + { + my $symbol = form_pg_type_symbol($values{typname}); + delete $values{oid_symbol} + if defined $symbol + and $values{oid_symbol} eq $symbol; + } if (!$full_tuples) @@ -181,6 +164,26 @@ } } + +# Determine canonical pg_type OID #define symbol from the type name. +sub form_pg_type_symbol +{ + my $typename = shift; + + # Skip for rowtypes of bootstrap tables. + return +
Re: WIP: a way forward on bootstrap data
Attached is v13, rebased against b0c90c85fc. Patch 0001: -The data files are formatted to at most 80 columns wide. -Rename rewrite_dat.pl to reformat_dat_file.pl. -Refactor Catalog.pm and reformat_dat_file.pl to have better separation of concerns. -Add src/include/catalog/Makefile with convenience targets for rewriting data files. Patch 0002: -Some adjustments to the post-conversion cleanup of data files. Patch 0005: -I made a stub version of Solution.pm to simulate testing the MSVC build. This found one bug, and also allowed me to bring in some of the more pedantic dependencies I added to utils/Makefile. Patch 0009: -New patch that puts all doc changes in one patch, for flexibility. -Split the parts of catalog/README having to do with data into a new README.data file. Add recipes for how to edit data, with code examples. On 3/26/18, Tom Lane wrote: > It would be more work, but maybe we should move this into the main > SGML docs. It seems rather silly to have SGML documentation for the > .BKI file format, which now will be an internal matter that hardly > any developers need worry about, but not for the .DAT file format. > But I understand if that seems a bridge too far for today --- certainly > a README file is way better than nothing. I had an idea to make it less silly without doing as much work: Get rid of the SGML docs for the BKI format, and turn them into bootstrap/README. Thoughts? And in the department of second thoughts, it occurred to me that the only reason that the .dat files are in include/catalog is because that's where the DATA() statements were. Since they are separate now, one could make the case that they actually belong in backend/catalog. One trivial advantage here is that there is already an existing Makefile in which to put convenience targets for formatting. On the other hand, it kind of makes sense to have the files describing the schema (.h) and the contents (.dat) in the same directory. I'm inclined to leave things as they are for that reason. -John Naylor v13-bootstrap-data-conversion.tar.gz Description: GNU Zip compressed data
Re: WIP: a way forward on bootstrap data
John Naylor writes: > And in the department of second thoughts, it occurred to me that the > only reason that the .dat files are in include/catalog is because > that's where the DATA() statements were. Since they are separate now, > one could make the case that they actually belong in backend/catalog. > One trivial advantage here is that there is already an existing > Makefile in which to put convenience targets for formatting. On the > other hand, it kind of makes sense to have the files describing the > schema (.h) and the contents (.dat) in the same directory. I'm > inclined to leave things as they are for that reason. Yeah. The fact that, eg, both the .h and .dat files are inputs to duplicate_oids and unused_oids makes me think it's better to keep them together. I'd actually been thinking of something that's about the reverse: instead of building the derived .h files in backend/catalog and then symlinking them into include/catalog, it'd be saner to build them in include/catalog to begin with. However, that would mean that the Perl scripts need to produce output in two different places, so maybe it'd end up more complicated not less so. In any case, that seems like something to leave for another day. regards, tom lane
Re: WIP: a way forward on bootstrap data
I'm starting to look through v13 seriously, and one thing struck me that could use some general discussion: what is our policy going to be for choosing the default values for catalog columns? In particular, I noticed that you have for pg_proc boolproisstrict BKI_DEFAULT(f); charprovolatile BKI_DEFAULT(v); charproparallel BKI_DEFAULT(u); which do not comport at all with the most common values in those columns. As of HEAD, I see postgres=# select proisstrict, count(*) from pg_proc group by 1; proisstrict | count -+--- f | 312 t | 2640 (2 rows) postgres=# select provolatile, count(*) from pg_proc group by 1; provolatile | count -+--- i | 2080 s | 570 v | 302 (3 rows) postgres=# select proparallel, count(*) from pg_proc group by 1; proparallel | count -+--- r | 139 s | 2722 u |91 (3 rows) (Since this is from the final initdb state, this overstates the number of .bki entries for pg_proc a bit, but not by much.) I think there's no question that the default for proisstrict ought to be "true" --- not only is that by far the more common choice, but it's actually the safer choice. A C function that needs to be marked strict and isn't will at best do the wrong thing, and quite likely will crash, if passed a NULL value. The defaults for provolatile and proparallel maybe require more thought though. What you've chosen corresponds to the default assumptions of CREATE FUNCTION, which are what we need for user-defined functions that we don't know anything about; but I'm not sure that makes them the best defaults for built-in functions. I'm inclined to go with the majority values here, in part because that will make the outliers stand out when looking at pg_proc.dat. I don't think it's great that we'll have 2800+ entries explicitly marked with proparallel 'i' or 's', but the less-than- 100 with proparallel 'u' will be so only implicitly because the rewrite script will strip out any field entries that match the default. That's really the worst of all worlds: it'd be better to have no default in this column at all, I think, than to behave like that. In short, I'm tempted to say that when there's a clear majority of entries that would use a particular default, that's the default we should use, whether or not it's "surprising" or "unsafe" according to the semantics. It's clearly not "surprising" for a C function to be marked proparallel 's'; the other cases are more so. I'm not seeing any other BKI_DEFAULT choices that I'm inclined to question, so maybe it's a mistake to try to derive any general policy choices from such a small number of cases. But anyway I'm inclined to change these cases. Comments anyone? regards, tom lane
Re: WIP: a way forward on bootstrap data
Hi, On 2018-04-04 18:29:31 -0400, Tom Lane wrote: > I'm starting to look through v13 seriously, and one thing struck > me that could use some general discussion: what is our policy > going to be for choosing the default values for catalog columns? > > [...] > > In short, I'm tempted to say that when there's a clear majority of > entries that would use a particular default, that's the default we > should use, whether or not it's "surprising" or "unsafe" according > to the semantics. It's clearly not "surprising" for a C function > to be marked proparallel 's'; the other cases are more so. > > [...] > > I'm not seeing any other BKI_DEFAULT choices that I'm inclined to > question, so maybe it's a mistake to try to derive any general > policy choices from such a small number of cases. But anyway > I'm inclined to change these cases. > > Comments anyone? I think choosing SQL defaults is defensible, but so is choosing the most common value as default to make uncommon stand out more, and so is choosing the safest values. In short, I don't think it matters terribly much, we just should try to be reasonably consistent about. Greetings, Andres Freund
Re: WIP: a way forward on bootstrap data
Here are the results of an evening's desultory hacking on v13. I was dissatisfied with the fact that we still had several function-referencing columns that had numeric instead of symbolic contents, for instance pg_aggregate.aggfnoid. Of course, the main reason is that those are declared regproc but reference functions with overloaded names, which regproc can't handle. Now that the lookups are being done in genbki.pl there's no reason why we have to live with that limitation. In the attached, I've generalized the BKI_LOOKUP(pg_proc) code so that you can use either regproc-like or regprocedure-like notation, and then applied that to relevant columns. I did not like the hard-wired handling of proargtypes and proallargtypes in genbki.pl; it hardly seems impossible that we'll want similar conversions for other array-of-OID columns in future. After a bit of thought, it seemed like we could allow oidvectorproargtypes BKI_LOOKUP(pg_type); Oid proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type); and just teach genbki.pl that if a lookup rule is attached to an oidvector or Oid[] column, it means to apply the rule to each array element individually. I also changed genbki.pl so that it'd warn about entries that aren't recognized by the lookup rules. This seems like a good idea for catching errors, such as (ahem) applying BKI_LOOKUP to a column that isn't even an OID. bootstrap-v13-delta.patch is a diff atop your patch series for the in-tree files, and convert_oid2name.patch adjusts that script to make use of the additional conversion capability. regards, tom lane diff --git a/src/backend/catalog/README.data b/src/backend/catalog/README.data index b7c680c..22ad0f2 100644 *** a/src/backend/catalog/README.data --- b/src/backend/catalog/README.data *** teach Catalog::ParseData() how to expand *** 62,71 representation. - To aid readability, some values that are references to other catalog ! entries are represented by macros rather than numeric OIDs. This is ! the case for index access methods, opclasses, operators, opfamilies, ! and types. This is done for functions as well, but only if the proname ! is unique. Bootstrap Data Conventions == --- 62,103 representation. - To aid readability, some values that are references to other catalog ! entries are represented by names rather than numeric OIDs. Currently ! this is the case for access methods, functions, operators, opclasses, ! opfamilies, and types. The rules are as follows: ! ! * Use of names rather than numbers is enabled for a particular catalog ! column by attaching BKI_LOOKUP(lookuprule) to the column's definition, ! where "lookuprule" is pg_am, pg_proc, pg_operator, pg_opclass, ! pg_opfamily, or pg_type. ! ! * In a name-lookup column, all entries must use the name format except ! when writing "0" for InvalidOid. (If the column is declared regproc, ! you can optionally write "-" instead of "0".) genbki.pl will warn ! about unrecognized names. ! ! * Access methods are just represented by their names, as are types. ! Type names must match the referenced pg_type entry's typname; you ! do not get to use any aliases such as "integer" for "int4". ! ! * A function can be represented by its proname, if that is unique among ! the pg_proc.dat entries (this works like regproc input). Otherwise, ! write it as "proname(argtypename,argtypename,...)", like regprocedure. ! The argument type names must be spelled exactly as they are in the ! pg_proc.dat entry's proargtypes field. Do not insert any spaces. ! ! * Operators are represented by "oprname(lefttype,righttype)", writing the ! type names exactly as they appear in the pg_operator.dat entry's oprleft ! and oprright fields. (Write 0 for the omitted operand of a unary ! operator.) ! ! * The names of opclasses and opfamilies are only unique within an access ! method, so they are represented by "access_method_name/object_name". ! ! In none of these cases is there any provision for schema-qualification; ! all objects created during bootstrap are expected to be in the pg_catalog ! schema. ! Bootstrap Data Conventions == *** You can also use the duplicate_oids scri *** 105,111 build if a duplicate is found.) - The OID counter starts at 1 at bootstrap. If a catalog row is ! in a table that requires OIDs, but no OID was preassigned by an "OID =" clause, then it will receive an OID of 1 or above. --- 137,143 build if a duplicate is found.) - The OID counter starts at 1 at bootstrap. If a catalog row is ! in a table that requires OIDs, but no OID was preassigned by an "oid =>" clause, then it will receive an OID of 1 or above. diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 27494d9..f6be50a 100644 *** a/src/backend/catalog/genbki.pl --- b/src/backend/catalog/genbk
Re: WIP: a way forward on bootstrap data
On 4/5/18, Tom Lane wrote: > Here are the results of an evening's desultory hacking on v13. > > [numeric function oids with overloaded name] Thank you for the detailed review and for improving the function references (not to mention the type references I somehow left on the table). I was also not quite satisfied with just the regproc columns. > I did not like the hard-wired handling of proargtypes and proallargtypes > in genbki.pl; it hardly seems impossible that we'll want similar > conversions for other array-of-OID columns in future. After a bit of > thought, it seemed like we could allow > > oidvectorproargtypes BKI_LOOKUP(pg_type); > > Oid proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type); > > and just teach genbki.pl that if a lookup rule is attached to > an oidvector or Oid[] column, it means to apply the rule to > each array element individually. I think that's a good idea. I went an extra step and extracted the common logic into a function (attached draft patch to be applied on top of yours). It treats all lookups as operating on arrays. The common case is that we pass a single-element array. That may seem awkward, but I think it's clear. The code is slimmer, and the lines now fit within 80 characters. > I also changed genbki.pl so that it'd warn about entries that aren't > recognized by the lookup rules. This seems like a good idea for > catching errors, such as (ahem) applying BKI_LOOKUP to a column > that isn't even an OID. Yikes, I must have fat-fingered that during the comment reformatting. Unrelated, I noticed my quoting of defaults that contain back-slashes was half-baked, so I'll include that fix in the next patchset. I'll put out a new one in a couple days, to give a chance for further review and discussion of the defaults. I didn't feel the need to respond to the other messages, but yours and Andres' points are well taken. -John Naylor diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index f6be50a..8d47109 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -343,34 +343,21 @@ EOM # # If the column type is oidvector or oid[], we have to replace # each element of the array as per the lookup rule. - # - # If we don't have a unique value to substitute, warn and - # leave the entry unchanged. if ($column->{lookup}) { my $lookup = $lookup_kind{ $column->{lookup} }; +my @lookupnames; +my @lookupoids; die "unrecognized BKI_LOOKUP type " . $column->{lookup} if !defined($lookup); if ($atttype eq 'oidvector') { - my @lookupnames = split /\s+/, $bki_values{$attname}; - my @lookupoids; - foreach my $lookupname (@lookupnames) - { - my $lookupoid = $lookup->{ $lookupname }; - if (defined($lookupoid) && $lookupoid ne 'MULTIPLE') - { - $lookupname = $lookupoid; - } - else - { - warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values)) -if $lookupname ne '-' && $lookupname ne '0'; - } - push @lookupoids, $lookupname; - } + @lookupnames = split /\s+/, $bki_values{$attname}; + @lookupoids = lookup_oids($lookup, $catname, +\%bki_values, @lookupnames); + $bki_values{$attname} = join(' ', @lookupoids); } elsif ($atttype eq 'oid[]') @@ -378,39 +365,21 @@ EOM if ($bki_values{$attname} ne '_null_') { $bki_values{$attname} =~ s/[{}]//g; - my @lookupnames = split /,/, $bki_values{$attname}; - my @lookupoids; - foreach my $lookupname (@lookupnames) - { - my $lookupoid = $lookup->{ $lookupname }; - if (defined($lookupoid) && $lookupoid ne 'MULTIPLE') - { -$lookupname = $lookupoid; - } - else - { -warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values)) - if $lookupname ne '-' && $lookupname ne '0'; - } - push @lookupoids, $lookupname; - } + @lookupnames = split /,/, $bki_values{$attname}; + @lookupoids = lookup_oids($lookup, $catname, + \%bki_values, @lookupnames); + $bki_values{$attname} = sprintf "{%s}", join(',', @lookupoids); } } else { - my $lookupname = $bki_values{$attname}; - my $lookupoid = $lookup->{ $lookupname }; - if (defined($lookupoid) && $lookupoid ne 'MULTIPLE') - { - $bki_values{$attname} = $lookupoid; - } - else - { - warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values)) - if $lookupname ne '-' && $lookupname ne '0'; - } + $lookupnames[0] = $bki_values{$attname}; + @lookupoids = lookup_oids($lookup, $catname, +\%bki_values, @lookupnames); + + $bki_values{$attname} = $lookupoids[0]; } } } @@ -759,6 +728,32 @@ sub morph_row_for_schemapg } } +# Perfo
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 4/5/18, Tom Lane wrote: >> I did not like the hard-wired handling of proargtypes and proallargtypes >> in genbki.pl; it hardly seems impossible that we'll want similar >> conversions for other array-of-OID columns in future. After a bit of >> thought, it seemed like we could allow >> oidvectorproargtypes BKI_LOOKUP(pg_type); >> Oid proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type); >> and just teach genbki.pl that if a lookup rule is attached to >> an oidvector or Oid[] column, it means to apply the rule to >> each array element individually. > I think that's a good idea. I went an extra step and extracted the > common logic into a function (attached draft patch to be applied on > top of yours). It treats all lookups as operating on arrays. The > common case is that we pass a single-element array. That may seem > awkward, but I think it's clear. The code is slimmer, and the lines > now fit within 80 characters. Sounds good. I too was bothered by the duplication of code, but I'm not a good enough Perl programmer to have thought of that solution. Something that bothered me a bit while writing the warning-producing code is that showing %bki_values isn't actually that great a way of identifying the trouble spot. By this point we've expanded out defaults and possibly replaced some other macros, so it doesn't look that much like what was in the .dat file. I think what would be ideal, both here and in some other places like AddDefaultValues, is to be able to finger the location of the bad tuple by filename and line number, but I have no idea whether it's practical to annotate the tuples with that while reading the .dat files. Any thoughts? (Obviously, better error messages could be a future improvement; it's not something we have to get done before the conversion.) > Unrelated, I noticed my quoting of defaults that contain back-slashes > was half-baked, so I'll include that fix in the next patchset. I'll > put out a new one in a couple days, to give a chance for further > review and discussion of the defaults. I didn't feel the need to > respond to the other messages, but yours and Andres' points are well > taken. We're getting down to the wire here --- I think the plan is to close the CF on Saturday or Sunday, and then push the bootstrap changes right after that. So please turn around whatever you're planning to do ASAP. I'm buckling down to a final review today and tomorrow. regards, tom lane
Re: WIP: a way forward on bootstrap data
I experimented with converting all frontend code to include just the catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the proposed new policy. I soon found that we'd overlooked one thing: some clients expect to see the relation OID macros, eg LargeObjectRelationId. Attached is a patch that changes things around so that those appear in the _d files instead of the master files. This is cleaner anyway because it removes duplication of the OIDs in the master files, with attendant risk of error. For example we have this change in pg_aggregate.h: -#define AggregateRelationId 2600 - -CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS +CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS Some of the CATALOG lines spill well past 80 characters with this, although many of the affected ones already were overlength, eg -#define DatabaseRelationId 1262 -#define DatabaseRelation_Rowtype_Id 1248 - -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) BKI_SCHEMA_MACRO +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO I thought about improving that by removing the restriction that these BKI annotations appear on the same line as the CATALOG macro, so that we could break the above into several lines. I think the original key reason for the restriction was to avoid accidentally taking some bit of a DATA line as a BKI annotation. With the DATA lines gone from these files, that's no longer a significant hazard (although passing references to BKI keywords in comments might still be hazards for the Perl scripts). However, if we try to format things like CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO { fields... } I'm afraid that neither pgindent nor a lot of common editors would indent that very nicely. So at least for the moment I'm inclined to just keep it all on one line ... we know how that behaves, anyway. regards, tom lane diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 519247e..fb3d62a 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -104,18 +104,29 @@ sub ParseHeader { push @{ $catalog{indexing} }, "build indices\n"; } - elsif (/^CATALOG\(([^,]*),(\d+)\)/) + elsif (/^CATALOG\(([^,]*),(\d+),(\w+)\)/) { $catalog{catname} = $1; $catalog{relation_oid} = $2; +$catalog{relation_oid_macro} = $3; $catalog{bootstrap} = /BKI_BOOTSTRAP/ ? ' bootstrap' : ''; $catalog{shared_relation} = /BKI_SHARED_RELATION/ ? ' shared_relation' : ''; $catalog{without_oids} = /BKI_WITHOUT_OIDS/ ? ' without_oids' : ''; -$catalog{rowtype_oid} = - /BKI_ROWTYPE_OID\((\d+)\)/ ? " rowtype_oid $1" : ''; +if (/BKI_ROWTYPE_OID\((\d+),(\w+)\)/) +{ + $catalog{rowtype_oid} = $1; + $catalog{rowtype_oid_clause} = " rowtype_oid $1"; + $catalog{rowtype_oid_macro} = $2; +} +else +{ + $catalog{rowtype_oid} = ''; + $catalog{rowtype_oid_clause} = ''; + $catalog{rowtype_oid_macro} = ''; +} $catalog{schema_macro} = /BKI_SCHEMA_MACRO/ ? 1 : 0; $declaring_attributes = 1; } diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index f6be50a..fe8c3ca 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -228,6 +228,7 @@ my @tables_needing_macros; # produce output, one catalog at a time foreach my $catname (@catnames) { + my $catalog = $catalogs{$catname}; # Create one definition header with macro definitions for each catalog. my $def_file = $output_path . $catname . '_d.h'; @@ -258,13 +259,21 @@ foreach my $catname (@catnames) EOM + # Emit OID macros for catalog's OID and rowtype OID, if wanted + print $def + sprintf("#define %s %s\n", $catalog->{relation_oid_macro}, $catalog->{relation_oid}) + if $catalog->{relation_oid_macro} ne ''; + print $def + sprintf("#define %s %s\n", $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid}) + if $catalog->{rowtype_oid_macro} ne ''; + print $def "\n"; + # .bki CREATE command for this catalog - my $catalog = $catalogs{$catname}; print $bki "create $catname $catalog->{relation_oid}" . $catalog->{shared_relation} . $catalog->{bootstrap} . $catalog->{without_oids} - . $catalog->{rowtype_oid} . "\n"; + . $catalog->{rowtype_oid_clause} . "\n"; my $first = 1; diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids index 9732f61..0e6285f 100755 --- a/src/include/catalog/duplicate_oids +++ b/src/include/catalog/duplicate_oids @@ -15,7 +15,7 @@ while (<>) next if /^CATALOG\(.*BKI_BOOTSTRAP/; next unless /\boid *=> *'(\d+)'/ - || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+)\)/ + || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+),/
Re: WIP: a way forward on bootstrap data
BTW, I experimented with adding blank lines between the hash items in the .dat files, and that seemed to make a nice improvement in readability, converting masses of rather gray text into visibly distinct stanzas. I'm not dead set on that, but try it and see what you think. A small example from pg_aggregate.dat: { aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum', aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine', aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv', aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal', aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' }, { aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum', aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv', aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8', agginitval => '{0,0}', aggminitval => '{0,0}' }, { aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum', aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv', aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8', agginitval => '{0,0}', aggminitval => '{0,0}' }, { aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum', aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine', aggserialfn => 'numeric_avg_serialize', aggdeserialfn => 'numeric_avg_deserialize', aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv', aggmfinalfn => 'numeric_avg', aggtranstype => 'internal', aggtransspace => '128', aggmtranstype => 'internal', aggmtransspace => '128' }, { aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum', aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine', aggtranstype => '_float8', agginitval => '{0,0,0}' }, versus { aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum', aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine', aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv', aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal', aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' }, { aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum', aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv', aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8', agginitval => '{0,0}', aggminitval => '{0,0}' }, { aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum', aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine', aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv', aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8', agginitval => '{0,0}', aggminitval => '{0,0}' }, { aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum', aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine', aggserialfn => 'numeric_avg_serialize', aggdeserialfn => 'numeric_avg_deserialize', aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv', aggmfinalfn => 'numeric_avg', aggtranstype => 'internal', aggtransspace => '128', aggmtranstype => 'internal', aggmtransspace => '128' }, { aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum', aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine', aggtranstype => '_float8', agginitval => '{0,0,0}' }, regards, tom lane
Re: WIP: a way forward on bootstrap data
On 4/6/18, Tom Lane wrote: > I experimented with converting all frontend code to include just the > catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the > proposed new policy. I soon found that we'd overlooked one thing: > some clients expect to see the relation OID macros, eg > LargeObjectRelationId. Attached is a patch that changes things around > so that those appear in the _d files instead of the master files. > This is cleaner anyway because it removes duplication of the OIDs in > the master files, with attendant risk of error. For example we > have this change in pg_aggregate.h: > > -#define AggregateRelationId 2600 > - > -CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS > +CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS > > Some of the CATALOG lines spill well past 80 characters with this, > although many of the affected ones already were overlength, eg > > -#define DatabaseRelationId 1262 > -#define DatabaseRelation_Rowtype_Id 1248 > - > -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) > BKI_SCHEMA_MACRO > +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION > BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO It seems most of the time the FooRelationId labels are predictable, although not as pristine as the Anum_* constants. One possibility that came to mind is to treat these like pg_type OID #defines -- have a simple rule that can be overridden for historical reasons. In this case the pg_database change would simply be: -#define DatabaseRelationId 1262 -#define DatabaseRelation_Rowtype_Id 1248 - and genbki.pl would know what to do. But for pg_am: -#define AccessMethodRelationId 2601 - -CATALOG(pg_am,2601) +CATALOG(pg_am,2601) BKI_REL_LABEL(AccessMethod) I haven't thought this through yet. I imagine it will add as well as remove a bit of complexity, code-wise. The upside is most CATALOG lines will remain unchanged, and those that do won't end up quite as long. I can try a draft tomorrow to see how it looks, unless you see an obvious downside. -John Naylor
Re: WIP: a way forward on bootstrap data
On 4/5/18, Tom Lane wrote: > I've generalized the BKI_LOOKUP(pg_proc) code so that > you can use either regproc-like or regprocedure-like notation, and then > applied that to relevant columns. > [...] > bootstrap-v13-delta.patch is a diff atop your patch series for the > in-tree files, and convert_oid2name.patch adjusts that script to > make use of the additional conversion capability. Looking at convert_oid2name.patch again, I see this: + elsif ($catname eq 'pg_am') + { + $values{aggfnoid} = lookup_procname($values{aggfnoid}); + } aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup. Do you remember the intent here? -John Naylor
Re: WIP: a way forward on bootstrap data
John Naylor writes: > It seems most of the time the FooRelationId labels are predictable, > although not as pristine as the Anum_* constants. One possibility that > came to mind is to treat these like pg_type OID #defines -- have a > simple rule that can be overridden for historical reasons. Meh. I'd just as soon avoid having some catalogs done one way and some another. regards, tom lane
Re: WIP: a way forward on bootstrap data
John Naylor writes: > Looking at convert_oid2name.patch again, I see this: > + elsif ($catname eq 'pg_am') > + { > + $values{aggfnoid} = > lookup_procname($values{aggfnoid}); > + } > aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup. > Do you remember the intent here? Ugh, copy-and-pasteo. I intended to have it lookup pg_am.amhandler, but must have missed changing the field name after copying code from the pg_aggregate stanza. Seems to have been unnecessary anyway, since all the entries in the column are already symbolic. regards, tom lane
Re: WIP: a way forward on bootstrap data
Just had another thought about this business: if practical, we should remove the distinction between "descr" and "shdescr" and just use the former name in .dat files. genbki.pl knows which catalogs are shared, so it ought to be able to figure out where to route the descriptions. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 4/6/18, Tom Lane wrote: > Just had another thought about this business: if practical, we should > remove the distinction between "descr" and "shdescr" and just use the > former name in .dat files. genbki.pl knows which catalogs are shared, > so it ought to be able to figure out where to route the descriptions. Fairly trivial (attached), and shouldn't be too hard to integrate into the series. -John Naylor diff --git a/src/backend/catalog/README.data b/src/backend/catalog/README.data index 22ad0f2..2c05fab 100644 --- a/src/backend/catalog/README.data +++ b/src/backend/catalog/README.data @@ -22,7 +22,7 @@ modified exerpt from pg_database.dat will demonstrate the key features: # that might contain non-word characters, so we must double-quote them. { oid => '1', oid_symbol => 'TemplateDbOid', - shdescr => 'database\'s default template', + descr => 'database\'s default template', datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING', datcollate => '"LC_COLLATE"', datctype => '"LC_CTYPE"', datistemplate => 't', datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0', @@ -33,8 +33,8 @@ modified exerpt from pg_database.dat will demonstrate the key features: -The overall file layout is: open bracket, one or more sets of curly brackets containing comma-separated key-value pairs, close bracket. --The metadata fields oid, oid_symbol, descr, and shdescr come first on -their own line(s) within the curly brackets. +-The metadata fields oid, oid_symbol, and descr come first on their own +line(s) within the curly brackets. -All values are single-quoted. -Single quotes within values must be escaped. -If a value is a macro to be expanded by initdb.c, it must also have diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 8d47109..bbe7da8 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -400,13 +400,16 @@ EOM # postgres.shdescription if (defined $bki_values{descr}) { - printf $descr "%s\t%s\t0\t%s\n", - $bki_values{oid}, $catname, $bki_values{descr}; - } - if (defined $bki_values{shdescr}) - { - printf $shdescr "%s\t%s\t%s\n", - $bki_values{oid}, $catname, $bki_values{shdescr}; + if ($catalog->{shared_relation}) + { +printf $shdescr "%s\t%s\t%s\n", + $bki_values{oid}, $catname, $bki_values{descr}; + } + else + { +printf $descr "%s\t%s\t0\t%s\n", + $bki_values{oid}, $catname, $bki_values{descr}; + } } # Emit OID symbol diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat index d5c2038..957ca84 100644 --- a/src/include/catalog/pg_database.dat +++ b/src/include/catalog/pg_database.dat @@ -16,7 +16,7 @@ # that might contain non-word characters, so we must double-quote them. { oid => '1', oid_symbol => 'TemplateDbOid', - shdescr => 'default template for new databases', + descr => 'default template for new databases', datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING', datcollate => '"LC_COLLATE"', datctype => '"LC_CTYPE"', datistemplate => 't', datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0', diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl index c247a72..f748a45 100644 --- a/src/include/catalog/reformat_dat_file.pl +++ b/src/include/catalog/reformat_dat_file.pl @@ -59,7 +59,7 @@ if ($output_path ne '' && substr($output_path, -1) ne '/') } # Metadata of a catalog entry -my @METADATA = ('oid', 'oid_symbol', 'descr', 'shdescr'); +my @METADATA = ('oid', 'oid_symbol', 'descr'); # Read all the input files into internal data structures. # We pass data file names as arguments and then look for matching
Re: WIP: a way forward on bootstrap data
On 4/6/18, Tom Lane wrote: > BTW, I experimented with adding blank lines between the hash items in the > .dat files, and that seemed to make a nice improvement in readability, > converting masses of rather gray text into visibly distinct stanzas. > I'm not dead set on that, but try it and see what you think. Narrow entries with natural whitespace might be okay as is. The pg_aggregate example is better with blank lines, but another thing to consider is that a comment that hugs a block is clear on which entries it's referring to (pg_amop): # btree integer_ops # default operators int2 { amopfamily => 'btree/integer_ops', amoplefttype => 'int2', amoprighttype => 'int2', amopstrategy => '1', amopopr => '<(int2,int2)', amopmethod => 'btree' }, { amopfamily => 'btree/integer_ops', amoplefttype => 'int2', amoprighttype => 'int2', amopstrategy => '2', amopopr => '<=(int2,int2)', amopmethod => 'btree' }, { amopfamily => 'btree/integer_ops', amoplefttype => 'int2', amoprighttype => 'int2', amopstrategy => '3', amopopr => '=(int2,int2)', amopmethod => 'btree' }, { amopfamily => 'btree/integer_ops', amoplefttype => 'int2', amoprighttype => 'int2', amopstrategy => '4', amopopr => '>=(int2,int2)', amopmethod => 'btree' }, { amopfamily => 'btree/integer_ops', amoplefttype => 'int2', amoprighttype => 'int2', amopstrategy => '5', amopopr => '>(int2,int2)', amopmethod => 'btree' }, # crosstype operators int24 { amopfamily => 'btree/integer_ops', amoplefttype => 'int2', ... [more blocks of integer ops ...] amopmethod => 'btree' }, # btree oid_ops With a blank line beween every entry, the comments would "float" more, and the scope is not as clear. I'm okay with whatever the community thinks, but at this point I'm inclined to leave things as they are and focus on the other points of review for the next patchset. While on the subject of viewing, I do have a badly outdated patch that would create a postgres.sql file which would load into a development schema so one could query the bootstrap data in a database without running initdb. I could update it at a future point. -John Naylor
Re: WIP: a way forward on bootstrap data
On 4/5/18, Tom Lane wrote: > Something that bothered me a bit while writing the warning-producing code > is that showing %bki_values isn't actually that great a way of identifying > the trouble spot. By this point we've expanded out defaults and possibly > replaced some other macros, so it doesn't look that much like what was > in the .dat file. I think what would be ideal, both here and in some > other places like AddDefaultValues, is to be able to finger the location > of the bad tuple by filename and line number, but I have no idea whether > it's practical to annotate the tuples with that while reading the .dat > files. Any thoughts? We could use the $. variable to save the line number, which is what the old code had. AddDefaultValues will report the line number on failure, so I left out explicit line number reporting. If memory serves, Perl is sensitive to how you format the "die" message. If I delete a default value from the header, I get this, reporting line 16: Failed to form full tuple for pg_opfamily Missing values for: opfnamespace Showing other values for context: oid => 421, opfmethod => 403, opfowner => PGUID, opfname => abstime_ops, at ../../../src/backend/catalog/Catalog.pm line 259, <$ifd> line 16. Makefile:23: recipe for target 'reformat-dat-files' failed make: *** [reformat-dat-files] Error 25 I think the context is good for pg_attribute, because there is no file to read from. I'll think about the lookup code. -John Naylor
Re: WIP: a way forward on bootstrap data
For version 14, diffed against f1464c53804: -Use majority values for proisstrict, provolatile, proparallel (patch 0006) -Use valid C string for multi-char defaults containing a backslash (patch 0006) -Apply Tom's patch for additional lookups, slightly modified by me (convert_oid2name.pl, patch 0007) -Apply Tom's patch for relation/rowtype OID macros (patch 0005) On 4/6/18, Tom Lane wrote: > Just had another thought about this business: if practical, we should > remove the distinction between "descr" and "shdescr" and just use the > former name in .dat files. genbki.pl knows which catalogs are shared, > so it ought to be able to figure out where to route the descriptions. Done (convert_header2dat.pl, patches 0001, 0003, 0009) On 4/5/18, Tom Lane wrote: > I think what would be ideal, both here and in some > other places like AddDefaultValues, is to be able to finger the location > of the bad tuple by filename and line number, but I have no idea whether > it's practical to annotate the tuples with that while reading the .dat > files. Any thoughts? Done (patch 0007). So far only lookup_oids() uses it. -John Naylor v14-bootstrap-data-conversion.tar.gz Description: GNU Zip compressed data
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 4/6/18, Tom Lane wrote: >> BTW, I experimented with adding blank lines between the hash items in the >> .dat files, and that seemed to make a nice improvement in readability, >> converting masses of rather gray text into visibly distinct stanzas. >> I'm not dead set on that, but try it and see what you think. > Narrow entries with natural whitespace might be okay as is. The > pg_aggregate example is better with blank lines, Yeah, it's somewhat of a worst-case example, because (more or less by chance) most of the entries have last lines that are mostly full. In other places it often happens that the last line of an entry is much shorter than average, and that provides enough of a visual break, as in your example from pg_amop. > but another thing to > consider is that a comment that hugs a block is clear on which entries > it's referring to (pg_amop): True, we'd need to rethink vertical spacing for comments. I don't offhand see why we couldn't keep comments that apply to a specific entry directly adjacent to that entry, though. Anyway, as I said, I'm not set on this change. If you're unexcited by the idea then let's drop it. regards, tom lane
Re: WIP: a way forward on bootstrap data
I wrote: > John Naylor writes: >> On 4/6/18, Tom Lane wrote: >>> BTW, I experimented with adding blank lines between the hash items in the >>> .dat files, and that seemed to make a nice improvement in readability, > Anyway, as I said, I'm not set on this change. If you're unexcited by > the idea then let's drop it. On third thought, there's actually a positive reason not to do that. The more vertical whitespace we have, the less traction there is for context diffs to find the right place to change, so that we'd be increasing the risk of patch misapplication. That was one of the worries that we set out to avoid with this whole design. So nevermind ... I'll get on with looking at v14. regards, tom lane
Re: WIP: a way forward on bootstrap data
Attached is a round of minor review fixes. Much of this is cleaning up existing mistakes or out-of-date comments, rather than anything introduced by this patchset, but I noticed it while going through the patch. The additional EXPOSE_TO_CLIENT_CODE bits actually are necessary, in some cases, as I had compile failures without them after changing client include lines. (I'll post that separately.) I added a couple more BKI_DEFAULT markers here, but omitted the ensuing .dat file updates, because they're just mechanical. I don't think there's any great need to incorporate this into your patch set. As far as I'm concerned, v14 is ready as-is, and I'll just apply this over the top of it. (Note that I'll probably smash the whole thing to one commit when the time comes.) I have some other work pending on the documentation aspect, but that's not quite ready for public consumption yet. regards, tom lane diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 8729ccd..1626999 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3566,7 +3566,7 @@ Oid PQftype(const PGresult *res, You can query the system table pg_type to obtain the names and properties of the various data types. The OIDs of the built-in data types are defined - in the file src/include/catalog/pg_type.h + in the file src/include/catalog/pg_type_d.h in the source tree. diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile index 17213d4..d25d98a 100644 --- a/src/backend/catalog/Makefile +++ b/src/backend/catalog/Makefile @@ -25,8 +25,10 @@ BKIFILES = postgres.bki postgres.description postgres.shdescription include $(top_srcdir)/src/backend/common.mk -# Note: there are some undocumented dependencies on the ordering in which -# the catalog header files are assembled into postgres.bki. +# Note: the order of this list determines the order in which the catalog +# header files are assembled into postgres.bki. BKI_BOOTSTRAP catalogs +# must appear first, and there are reputedly other, undocumented ordering +# dependencies. CATALOG_HEADERS := \ pg_proc.h pg_type.h pg_attribute.h pg_class.h \ pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \ @@ -49,7 +51,8 @@ CATALOG_HEADERS := \ GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h # In the list of headers used to assemble postgres.bki, indexing.h needs -# be last, and toasting.h just before it. +# be last, and toasting.h just before it. This ensures we don't try to +# create indexes or toast tables before their catalogs exist. POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\ $(CATALOG_HEADERS) toasting.h indexing.h \ ) diff --git a/src/include/catalog/Makefile b/src/include/catalog/Makefile index d84a572..1da3ea7 100644 --- a/src/include/catalog/Makefile +++ b/src/include/catalog/Makefile @@ -2,9 +2,6 @@ # # Makefile for src/include/catalog # -# 'make reformat-dat-files' is a convenience target for rewriting the -# catalog data files in a standard format. -# # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group # Portions Copyright (c) 1994, Regents of the University of California # @@ -19,10 +16,16 @@ include $(top_builddir)/src/Makefile.global # location of Catalog.pm catalogdir = $(top_srcdir)/src/backend/catalog +# 'make reformat-dat-files' is a convenience target for rewriting the +# catalog data files in our standard format. This includes collapsing +# out any entries that are redundant with a BKI_DEFAULT annotation. reformat-dat-files: $(PERL) -I $(catalogdir) reformat_dat_file.pl pg_*.dat +# 'make expand-dat-files' is a convenience target for expanding out all +# default values in the catalog data files. This should be run before +# altering or removing any BKI_DEFAULT annotation. expand-dat-files: $(PERL) -I $(catalogdir) reformat_dat_file.pl pg_*.dat --full-tuples -.PHONY: reformat-dat-files +.PHONY: reformat-dat-files expand-dat-files diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids index 0e6285f..8c143cf 100755 --- a/src/include/catalog/duplicate_oids +++ b/src/include/catalog/duplicate_oids @@ -30,7 +30,7 @@ foreach my $oid (sort { $a <=> $b } keys %oidcounts) { next unless $oidcounts{$oid} > 1; $found = 1; - print "***Duplicate OID: $oid\n"; + print "$oid\n"; } exit $found; diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h index 02c38c4..b1e2cbd 100644 --- a/src/include/catalog/genbki.h +++ b/src/include/catalog/genbki.h @@ -34,7 +34,7 @@ #define BKI_FORCE_NOT_NULL /* Specifies a default value for a catalog field */ #define BKI_DEFAULT(value) -/* Indicates how to perform name lookups for OID fields */ +/* Indicates how to perform name lookups for an OID or OID-array field */ #define BKI_LOOKUP(catalog) /* The following are never defined; they are here only for documentation. */
Re: WIP: a way forward on bootstrap data
Oh, one more thing: looking again at the contents of pg_proc.dat, I find myself annoyed at the need to specify pronargs. That's entirely derivable from proargtypes, and if we did so, we'd get down to this for the first few pg_proc entries: { oid => '1242', descr => 'I/O', proname => 'boolin', prorettype => 'bool', proargtypes => 'cstring', prosrc => 'boolin' }, { oid => '1243', descr => 'I/O', proname => 'boolout', prorettype => 'cstring', proargtypes => 'bool', prosrc => 'boolout' }, { oid => '1244', descr => 'I/O', proname => 'byteain', prorettype => 'bytea', proargtypes => 'cstring', prosrc => 'byteain' }, which seems pretty close to the minimum amount of stuff you could expect to need to write. I recall that this and some other data compression methods were on the table awhile back, and I expressed doubt about putting very much magic knowledge in genbki.pl, but this case seems like a no-brainer. genbki can always get the right answer, and expecting people to do it just creates another way to make a mistake. So attached is an incremental patch to do that. I had to adjust AddDefaultValues's argument list to tell it the catalog name, and once I did that, I saw that the API arrangement whereby the caller throws error was just useless complexity, so I got rid of it. regards, tom lane diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 70aea89..3b3bb6b 100644 *** a/src/backend/catalog/Catalog.pm --- b/src/backend/catalog/Catalog.pm *** sub ParseData *** 254,265 } # Expand tuples to their full representation. ! my $error = AddDefaultValues($hash_ref, $schema); ! if ($error) ! { ! print "Failed to form full tuple for $catname\n"; ! die $error; ! } } else { --- 254,260 } # Expand tuples to their full representation. ! AddDefaultValues($hash_ref, $schema, $catname); } else { *** sub ParseData *** 289,302 return $data; } ! # Fill in default values of a record using the given schema. It's the ! # caller's responsibility to specify other values beforehand. ! # If we fail to fill all fields, return a nonempty error message. sub AddDefaultValues { ! my ($row, $schema) = @_; my @missing_fields; - my $msg; foreach my $column (@$schema) { --- 284,295 return $data; } ! # Fill in default values of a record using the given schema. ! # It's the caller's responsibility to specify other values beforehand. sub AddDefaultValues { ! my ($row, $schema, $catname) = @_; my @missing_fields; foreach my $column (@$schema) { *** sub AddDefaultValues *** 311,316 --- 304,316 { $row->{$attname} = $column->{default}; } + elsif ($catname eq 'pg_proc' && $attname eq 'pronargs' && + defined($row->{proargtypes})) + { + # pg_proc.pronargs can be derived from proargtypes. + my @proargtypes = split /\s+/, $row->{proargtypes}; + $row->{$attname} = scalar(@proargtypes); + } else { # Failed to find a value. *** sub AddDefaultValues *** 320,333 if (@missing_fields) { ! $msg = "Missing values for: " . join(', ', @missing_fields); ! $msg .= "\nShowing other values for context:\n"; while (my($key, $value) = each %$row) { $msg .= "$key => $value, "; } } - return $msg; } # Rename temporary files to final names. --- 320,334 if (@missing_fields) { ! my $msg = "Failed to form full tuple for $catname\n"; ! $msg .= "Missing values for: " . join(', ', @missing_fields); ! $msg .= "\nOther values for row:\n"; while (my($key, $value) = each %$row) { $msg .= "$key => $value, "; } + die $msg; } } # Rename temporary files to final names. diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index d4c2ddf..3512952 100644 *** a/src/backend/catalog/genbki.pl --- b/src/backend/catalog/genbki.pl *** sub morph_row_for_pgattr *** 641,651 $row->{attnotnull} = 'f'; } ! my $error = Catalog::AddDefaultValues($row, $pgattr_schema); ! if ($error) ! { ! die "Failed to form full tuple for pg_attribute: ", $error; ! } } # Write an entry to postgres.bki. Adding quotes here allows us to keep --- 641,647 $row->{attnotnull} = 'f'; } ! Catalog::AddDefaultValues($row, $pgattr_schema, 'pg_attribute'); } # Write an entry to postgres.bki. Adding quotes here allows us to keep diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 2c42b5c..b25546e 100644 *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *** CATALOG(pg_proc,1255,ProcedureRelationId *** 73,78 --- 73,79 char proparallel BKI_DEFAULT(s); /* number of arguments */ + /* Note: need not be given in pg_proc.dat; genbki.pl will compute it */ in
Re: WIP: a way forward on bootstrap data
On 4/6/18, Tom Lane wrote: > I don't think there's any great need to incorporate this into your patch > set. As far as I'm concerned, v14 is ready as-is, and I'll just apply > this over the top of it. (Note that I'll probably smash the whole thing > to one commit when the time comes.) Glad to hear it. A couple recent commits added #define symbols to headers, which broke the patchset, so I've attached v15, diff'd against 4f813c7203e. Commit 9fdb675fc added a symbol to pg_opfamily.h where there were none before, so I went ahead and wrapped it with an EXPOSE_TO_CLIENT_CODE macro. All your additional patches apply still apply over it. Your SGML patch can only apply if my doc patch is not applied, but I've included it anyway for the sake of no surprises. I'll check back in 24 hours to see if everything still applies. -John Naylor v15-bootstrap-data-conversion.tar.gz Description: GNU Zip compressed data
Re: WIP: a way forward on bootstrap data
John Naylor wrote: > On 4/6/18, Tom Lane wrote: > > > I don't think there's any great need to incorporate this into your patch > > set. As far as I'm concerned, v14 is ready as-is, and I'll just apply > > this over the top of it. (Note that I'll probably smash the whole thing > > to one commit when the time comes.) > > Glad to hear it. A couple recent commits added #define symbols to > headers, which broke the patchset, so I've attached v15, diff'd > against 4f813c7203e. Commit 9fdb675fc added a symbol to pg_opfamily.h > where there were none before, so I went ahead and wrapped it with an > EXPOSE_TO_CLIENT_CODE macro. Actually, after pushing that, I was thinking maybe it's better to remove that #define from there and put it in each of the two .c files that need it. I don't think it makes sense to expose this macro any further, and before that commit it was localized to a single file. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
Alvaro Herrera writes: > John Naylor wrote: >> Commit 9fdb675fc added a symbol to pg_opfamily.h >> where there were none before, so I went ahead and wrapped it with an >> EXPOSE_TO_CLIENT_CODE macro. > Actually, after pushing that, I was thinking maybe it's better to remove > that #define from there and put it in each of the two .c files that need > it. I don't think it makes sense to expose this macro any further, and > before that commit it was localized to a single file. We're speaking of IsBooleanOpfamily, right? Think I'd leave it where it is. As soon as you have more than one place using a macro like that, there's room for maintenance mistakes. Now it could also be argued that indxpath.c and partprune.c don't necessarily have the same idea of "boolean opfamily" anyway, in which case giving them separate copies might be better. Not sure about that. Anyway, now that John and I have each (separately) rebased the bootstrap patch over that, I'd appreciate it if you hold off cosmetic refactoring till said patch goes in, which I expect to do in ~ 24 hours. regards, tom lane
Re: WIP: a way forward on bootstrap data
Tom Lane wrote: > Alvaro Herrera writes: > > John Naylor wrote: > >> Commit 9fdb675fc added a symbol to pg_opfamily.h > >> where there were none before, so I went ahead and wrapped it with an > >> EXPOSE_TO_CLIENT_CODE macro. > > > Actually, after pushing that, I was thinking maybe it's better to remove > > that #define from there and put it in each of the two .c files that need > > it. I don't think it makes sense to expose this macro any further, and > > before that commit it was localized to a single file. > > We're speaking of IsBooleanOpfamily, right? Yeah, that's the one. > Think I'd leave it where it is. As soon as you have more than one > place using a macro like that, there's room for maintenance mistakes. Yeah, that's why I thought it'd be better to have it somewhere central (the originally submitted patch just added it to partprune.c). > Anyway, now that John and I have each (separately) rebased the bootstrap > patch over that, I'd appreciate it if you hold off cosmetic refactoring > till said patch goes in, which I expect to do in ~ 24 hours. Understood. I'm going over David Rowley's runtime pruning patch now (doesn't touch any catalog files), which I intend to be my last functional commit this cycle, and won't be doing any other commits till after bootstrap rework has landed. (As I mentioned elsewhere, I intend to propose some restructuring of partitioning code, without any functional changes, during next week.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
I wrote: > I'll check back in 24 hours to see if everything still applies. There were a couple more catalog changes that broke patch context, so attached is version 16. -John Naylor v16-bootstrap-data-conversion.tar.gz Description: GNU Zip compressed data
Re: WIP: a way forward on bootstrap data
John Naylor writes: > There were a couple more catalog changes that broke patch context, so > attached is version 16. Pushed with a few more adjustments (mostly, another round of copy-editing on bki.sgml). Now we wait to see what the buildfarm thinks, particularly about the MSVC build ... Congratulations, and many thanks, to John for seeing this through! I know it's been a huge amount of work, but we've needed this for years. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 4/9/18, Tom Lane wrote: > Congratulations, and many thanks, to John for seeing this through! > I know it's been a huge amount of work, but we've needed this for years. Many thanks for your review, advice, and additional hacking. Thanks also to Álvaro for review and initial commits, and to all who participated in previous discussion. -John Naylor
Re: WIP: a way forward on bootstrap data
On April 8, 2018 10:19:59 AM PDT, Tom Lane wrote: >John Naylor writes: >> There were a couple more catalog changes that broke patch context, so >> attached is version 16. > >Pushed with a few more adjustments (mostly, another round of >copy-editing >on bki.sgml). Now we wait to see what the buildfarm thinks, >particularly >about the MSVC build ... > >Congratulations, and many thanks, to John for seeing this through! >I know it's been a huge amount of work, but we've needed this for >years. Seconded and thirded and fourthed (?)! -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: WIP: a way forward on bootstrap data
On 4/6/18, Tom Lane wrote: > I experimented with converting all frontend code to include just the > catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the > proposed new policy. I soon found that we'd overlooked one thing: > some clients expect to see the relation OID macros, eg > LargeObjectRelationId. Attached is a patch that changes things around > so that those appear in the _d files instead of the master files. [...] > Some of the CATALOG lines spill well past 80 characters with this, > although many of the affected ones already were overlength, eg > > -#define DatabaseRelationId 1262 > -#define DatabaseRelation_Rowtype_Id 1248 > - > -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) > BKI_SCHEMA_MACRO > +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION > BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO [ to which I responded with an inadequate alternative ] Thinking about this some more, a way occurred to me to shorten the CATALOG lines while still treating all headers the same, and with very little code (Patch 0001). What we do is automate the use of 'RelationId' and 'Relation_Rowtype_Id' so that the CATALOG macro only needs the part pertaining to the table name, and the BKI_ROWTYPE_OID macro can go back to just having the OID, eg: CATALOG(pg_database,1262,Database) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) BKI_SCHEMA_MACRO This is shorter, but not quite as short as before the conversion. If we really wanted to, we could also leave off the BKI_ prefix from the CATALOG options (Patch 0002), eg: CATALOG(pg_database,1262,Database) SHARED_RELATION ROWTYPE_OID(1248) SCHEMA_MACRO CATALOG itself lacks a prefix, and its options can only go in one place, so we'd lose some consistency but perhaps we don't lose any clarity. (This isn't true for the attribute-level options for nullability etc, which can appear all over the place.) Results below - number of CATALOG lines with more than 80 characters, and the longest line: grep -E '^CATALOG\(' src/include/catalog/pg_*.h | sed 's/.*://' | awk '{ print length, $0 }' | sort -n -r -- before conversion: 6 lines > 80 chars 105 CATALOG(pg_auth_members,1261) BKI_SHARED_RELATION BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(2843) BKI_SCHEMA_MACRO -- after conversion: 14 lines > 80 chars 162 CATALOG(pg_shseclabel,3592,SharedSecLabelRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(4066,SharedSecLabelRelation_Rowtype_Id) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO -- with Patch 0001: 11 lines > 80 chars 118 CATALOG(pg_shseclabel,3592,SharedSecLabel) BKI_SHARED_RELATION BKI_ROWTYPE_OID(4066) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO -- with Patch 0002: 5 lines > 80 chars 102 CATALOG(pg_shseclabel,3592,SharedSecLabel) SHARED_RELATION ROWTYPE_OID(4066) WITHOUT_OIDS SCHEMA_MACRO -John Naylor From ff1384643e47ef423ab13cfd847f12c0969029f1 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Mon, 16 Apr 2018 17:42:28 +0700 Subject: [PATCH 1/2] Shorten notation for relation and rowtype OID macros --- src/backend/catalog/Catalog.pm| 8 +--- src/include/catalog/duplicate_oids| 2 +- src/include/catalog/genbki.h | 4 ++-- src/include/catalog/pg_aggregate.h| 2 +- src/include/catalog/pg_am.h | 2 +- src/include/catalog/pg_amop.h | 2 +- src/include/catalog/pg_amproc.h | 2 +- src/include/catalog/pg_attrdef.h | 2 +- src/include/catalog/pg_attribute.h| 2 +- src/include/catalog/pg_auth_members.h | 2 +- src/include/catalog/pg_authid.h | 2 +- src/include/catalog/pg_cast.h | 2 +- src/include/catalog/pg_class.h| 2 +- src/include/catalog/pg_collation.h| 2 +- src/include/catalog/pg_constraint.h | 2 +- src/include/catalog/pg_conversion.h | 2 +- src/include/catalog/pg_database.h | 2 +- src/include/catalog/pg_db_role_setting.h | 2 +- src/include/catalog/pg_default_acl.h | 2 +- src/include/catalog/pg_depend.h | 2 +- src/include/catalog/pg_description.h | 2 +- src/include/catalog/pg_enum.h | 2 +- src/include/catalog/pg_event_trigger.h| 2 +- src/include/catalog/pg_extension.h| 2 +- src/include/catalog/pg_foreign_data_wrapper.h | 2 +- src/include/catalog/pg_foreign_server.h | 2 +- src/include/catalog/pg_foreign_table.h| 2 +- src/include/catalog/pg_index.h| 2 +- src/include/catalog/pg_inherits.h | 2 +- src/include/catalog/pg_init_privs.h | 2 +- src/include/catalog/pg_language.h | 2 +- src/include/catalog/pg_largeobject.h | 2 +- src/include/catalog/pg_largeobject_metadata.h | 2 +- src/include/catalog/pg_namespace.h| 2 +- src/include/catalog/pg_opclass.h | 2 +- src/include/catalog/pg_operator.h | 2 +- src/include/catalog
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 4/6/18, Tom Lane wrote: >> Some of the CATALOG lines spill well past 80 characters with this, >> although many of the affected ones already were overlength, eg ... > Thinking about this some more, a way occurred to me to shorten the > CATALOG lines while still treating all headers the same, and with very > little code (Patch 0001). What we do is automate the use of > 'RelationId' and 'Relation_Rowtype_Id' so that the CATALOG macro only > needs the part pertaining to the table name, and the BKI_ROWTYPE_OID > macro can go back to just having the OID, eg: Hm ... I don't like this too much, because it means that grepping for those macros will no longer turn up the source of their definition. Yeah, if you already know how Relation_Rowtype_Id macros are created, you might not be confused, but I think it'd be problematic for newcomers. Essentially we'd be shortening these lines by obfuscating, which doesn't seem like a good tradeoff. It might be all right to drop the BKI_ prefixes as per your other suggestion, but I'm worried about possible symbol conflicts. It's probably not really worth changing that by itself. regards, tom lane
Re: WIP: a way forward on bootstrap data
On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger wrote: > There still seems to be a lot of boilerplate in the .dat files > that could be eliminated. Tom mentioned upthread that he did > not want too much magic in genbki.pl or Catalog.pm, but I think > I can propose putting some magic in the header files themselves. > > Take, for example, some of the fields in pg_type.dat. I'll elide > the ones I'm not talking about with ...: > > > {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout', > typreceive => 'Xrecv', typsend => 'Xsend', ... }, -1 for trying to automate this. It varies between fooin and foo_in, and it'll be annoying to have to remember which one happens automatically and which one needs an override. > If we changed pg_proc.h: > > /* procedure source text */ > - textprosrc BKI_FORCE_NOT_NULL; > + textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL; > > we could remove the prosrc field from many of the records, which would > do a better job of calling attention to the remaining records where the > C function name differs from the SQL function name. That one I kinda like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: a way forward on bootstrap data
> On Apr 25, 2018, at 1:00 PM, Robert Haas wrote: > > On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger wrote: >> There still seems to be a lot of boilerplate in the .dat files >> that could be eliminated. Tom mentioned upthread that he did >> not want too much magic in genbki.pl or Catalog.pm, but I think >> I can propose putting some magic in the header files themselves. >> >> Take, for example, some of the fields in pg_type.dat. I'll elide >> the ones I'm not talking about with ...: >> >> >> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout', >> typreceive => 'Xrecv', typsend => 'Xsend', ... }, > > -1 for trying to automate this. It varies between fooin and foo_in, > and it'll be annoying to have to remember which one happens > automatically and which one needs an override. That may be a good argument. I was on the fence about it, because I like the idea that the project might do some cleanup and standardize the names of all in/out/send/recv functions so that no overrides would be required. (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to be named in a standard way.) Would that be an API break and hence a non-starter? > >> If we changed pg_proc.h: >> >>/* procedure source text */ >> - textprosrc BKI_FORCE_NOT_NULL; >> + textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL; >> >> we could remove the prosrc field from many of the records, which would >> do a better job of calling attention to the remaining records where the >> C function name differs from the SQL function name. > > That one I kinda like. Yeah, that one is simpler, and it is the one that makes the majority of the difference in bringing down the size of the .dat files. mark
Re: WIP: a way forward on bootstrap data
Robert Haas writes: > On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger wrote: >> There still seems to be a lot of boilerplate in the .dat files >> that could be eliminated. ... >> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout', >> typreceive => 'Xrecv', typsend => 'Xsend', ... }, > -1 for trying to automate this. It varies between fooin and foo_in, > and it'll be annoying to have to remember which one happens > automatically and which one needs an override. Yeah, that was my first reaction to that example as well. If it weren't so nearly fifty-fifty then we might have more traction there; but it's pretty close, and actually the foo_in cases outnumber fooin if I counted correctly. (Array entries should be ignored for this purpose; maybe we'll autogenerate them someday.) >> + textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL; > That one I kinda like. Agreed, this seems more compelling. However, I think we need more than one compelling example to justify the additional infrastructure. There aren't that many places where there's obvious internal redundancy in single catalog rows, so I'm not sure that we're going to find a lot of win here. (The prosrc-from-proname case, in isolation, could be handled about as well by adding a hardwired rule like we have for pronargs.) I don't especially like the idea of trying to compute, for instance, typalign from typlen. That's mostly going to encourage people to overlook it, which isn't a good thing, because there are plenty of places where genbki.pl couldn't be expected to get it right. regards, tom lane
Re: WIP: a way forward on bootstrap data
Mark Dilger writes: >> On Apr 25, 2018, at 1:00 PM, Robert Haas wrote: >> -1 for trying to automate this. It varies between fooin and foo_in, >> and it'll be annoying to have to remember which one happens >> automatically and which one needs an override. > That may be a good argument. I was on the fence about it, because I > like the idea that the project might do some cleanup and standardize > the names of all in/out/send/recv functions so that no overrides would > be required. (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to > be named in a standard way.) > Would that be an API break and hence a non-starter? Yeah, I'm afraid so. I'm pretty sure I recall cases where people invoked I/O functions by name, and I definitely recall cases where operator-implementation functions were invoked by name. Those might not be very bright things to do, but people do 'em. We'd also be risking creating problems for individual apps by conflicting with user-defined functions that we didn't use to conflict with. We take that chance every time we add functionality, of course, but usually we can soothe complaints by pointing out that they got some new functionality in return. I don't think "we made I/O function names more uniform" is going to be a very satisfactory excuse for breakage. regards, tom lane
Re: WIP: a way forward on bootstrap data
> On Apr 25, 2018, at 1:31 PM, Tom Lane wrote: > > Robert Haas writes: >> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger wrote: >>> There still seems to be a lot of boilerplate in the .dat files >>> that could be eliminated. ... > >>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout', >>> typreceive => 'Xrecv', typsend => 'Xsend', ... }, > >> -1 for trying to automate this. It varies between fooin and foo_in, >> and it'll be annoying to have to remember which one happens >> automatically and which one needs an override. > > Yeah, that was my first reaction to that example as well. If it > weren't so nearly fifty-fifty then we might have more traction there; > but it's pretty close, and actually the foo_in cases outnumber fooin > if I counted correctly. (Array entries should be ignored for this > purpose; maybe we'll autogenerate them someday.) Part of the problem right now is that nothing really encourages new functions to be named foo_in vs. fooin, so the nearly 50/50 split will continue when new code is contributed. If the system forced you to specify the name when you did it in a nonstandard way, and not otherwise, that would have the effect of documenting which way is now considered standard. > >>> + textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL; > >> That one I kinda like. > > Agreed, this seems more compelling. However, I think we need more than > one compelling example to justify the additional infrastructure. I'll look for more. > There > aren't that many places where there's obvious internal redundancy in > single catalog rows, so I'm not sure that we're going to find a lot of win > here. (The prosrc-from-proname case, in isolation, could be handled about > as well by adding a hardwired rule like we have for pronargs.) Right, but doing that in genbki.pl or Catalog.pm is an obfuscation. Doing it in pg_proc.h makes is much more reasonable, I think. Note that my patch does not hardcode the logic in the perl code. It teaches the perl code how to do variable substitution, but then allows the actual rules for each row to be written in the header file itself. > > I don't especially like the idea of trying to compute, for instance, > typalign from typlen. That's mostly going to encourage people to overlook > it, which isn't a good thing, because there are plenty of places where > genbki.pl couldn't be expected to get it right. Well, it wouldn't be in genbki.pl, it would be in pg_type.h, but I take your broader point that you don't want this calculated.
Re: WIP: a way forward on bootstrap data
>> + textprosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL; >> >>> That one I kinda like. >> >> Agreed, this seems more compelling. However, I think we need more than >> one compelling example to justify the additional infrastructure. > > I'll look for more. There are two more that seem reasonable optimizations in pg_cast.h: diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h index 7f4a25b2da..98794df7f8 100644 --- a/src/include/catalog/pg_cast.h +++ b/src/include/catalog/pg_cast.h @@ -37,13 +37,13 @@ CATALOG(pg_cast,2605,CastRelationId) Oid casttarget BKI_LOOKUP(pg_type); /* cast function; 0 = binary coercible */ - Oid castfunc BKI_LOOKUP(pg_proc); + Oid castfunc BKI_DEFAULT("${casttarget}(${castsource})") BKI_LOOKUP(pg_proc); /* contexts in which cast can be used */ charcastcontext; /* cast method */ - charcastmethod; + charcastmethod BKI_DEFAULT("/${castfunc} == 0 ? 'b' : 'f'/e"); } FormData_pg_cast; /* Which would convert numerous lines like: { castsource => 'money', casttarget => 'numeric', castfunc => 'numeric(money)', castcontext => 'a', castmethod => 'f' }, To shorter lines like: { castsource => 'money', casttarget => 'numeric', castcontext => 'a' }, which is great, because all you really are trying to tell the postgres system is that when you cast from numeric to money it has to be an explicit assignment and not an implicit cast. The extra stuff about castfunc and castmethod just gets in the way of understanding what is being done. There are another two in pg_opclass.h: diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h index b980327fc0..9f528f97c0 100644 --- a/src/include/catalog/pg_opclass.h +++ b/src/include/catalog/pg_opclass.h @@ -52,7 +52,7 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId) Oid opcmethod BKI_LOOKUP(pg_am); /* name of this opclass */ - NameDataopcname; + NameDataopcname BKI_DEFAULT("${opcintype}_ops"); /* namespace of this opclass */ Oid opcnamespace BKI_DEFAULT(PGNSP); @@ -61,7 +61,7 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId) Oid opcowner BKI_DEFAULT(PGUID); /* containing operator family */ - Oid opcfamily BKI_LOOKUP(pg_opfamily); + Oid opcfamily BKI_DEFAULT("${opcmethod}/${opcintype}_ops") BKI_LOOKUP(pg_opfamily); /* type of data indexed by opclass */ Oid opcintype BKI_LOOKUP(pg_type); Which would convert numerous lines like the following line from pg_opclass.dat: { opcmethod => 'btree', opcname => 'bytea_ops', opcfamily => 'btree/bytea_ops', opcintype => 'bytea' }, to much easier to read lines like: { opcmethod => 'btree', opcintype => 'bytea' }, which is also great, because you're really just declaring that type bytea has a btree opclass and letting the system do the rest of the work. Having to manually specify opfamily and the opname just clutters the row and makes it less intuitive. There are a bunch more of varying quality, depending on which automations you like or don't like. mark
Re: WIP: a way forward on bootstrap data
On Thu, Apr 26, 2018 at 2:11 AM, Mark Dilger wrote: > >> On Apr 25, 2018, at 1:31 PM, Tom Lane wrote: >> >> Robert Haas writes: >>> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger >>> wrote: There still seems to be a lot of boilerplate in the .dat files that could be eliminated. ... >> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout', typreceive => 'Xrecv', typsend => 'Xsend', ... }, >> >>> -1 for trying to automate this. It varies between fooin and foo_in, >>> and it'll be annoying to have to remember which one happens >>> automatically and which one needs an override. >> >> Yeah, that was my first reaction to that example as well. If it >> weren't so nearly fifty-fifty then we might have more traction there; >> but it's pretty close, and actually the foo_in cases outnumber fooin >> if I counted correctly. (Array entries should be ignored for this >> purpose; maybe we'll autogenerate them someday.) > > Part of the problem right now is that nothing really encourages new > functions to be named foo_in vs. fooin, so the nearly 50/50 split will > continue when new code is contributed. If the system forced you to > specify the name when you did it in a nonstandard way, and not otherwise, > that would have the effect of documenting which way is now considered > standard. > FWIW, I would like some standard naming convention one way or the other for in/out function names. Looking up pg_type.h for in/out functions when you know the built-in type name isn't great. But that itself may not be enough reason to change it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: WIP: a way forward on bootstrap data
On Wed, Apr 25, 2018 at 04:39:42PM -0400, Tom Lane wrote: > Mark Dilger writes: > >> On Apr 25, 2018, at 1:00 PM, Robert Haas wrote: > >> -1 for trying to automate this. It varies between fooin and foo_in, > >> and it'll be annoying to have to remember which one happens > >> automatically and which one needs an override. > > > That may be a good argument. I was on the fence about it, because I > > like the idea that the project might do some cleanup and standardize > > the names of all in/out/send/recv functions so that no overrides would > > be required. (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to > > be named in a standard way.) > > > Would that be an API break and hence a non-starter? > > Yeah, I'm afraid so. I'm pretty sure I recall cases where people > invoked I/O functions by name, and I definitely recall cases where > operator-implementation functions were invoked by name. Those might > not be very bright things to do, but people do 'em. > > We'd also be risking creating problems for individual apps by conflicting > with user-defined functions that we didn't use to conflict with. We take > that chance every time we add functionality, of course, but usually we can > soothe complaints by pointing out that they got some new functionality in > return. I don't think "we made I/O function names more uniform" is going > to be a very satisfactory excuse for breakage. What do you rate the chances that someone created a foo_in when there's already a built-in fooin? If it's low enough, we could add all the foo_ins as aliases to fooins and then mandate that new ones be called foo_in for future foos. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WIP: a way forward on bootstrap data
On 4/26/18, Tom Lane wrote: > (The prosrc-from-proname case, in isolation, could be handled about > as well by adding a hardwired rule like we have for pronargs.) If we think we might go in the direction of more special-case behavior, the attached patch more fully documents what we already do, and does some minor refactoring to make future additions more straightforward. I think this would even be good to do for v11. > Robert Haas writes: >> -1 for trying to automate this. It varies between fooin and foo_in, >> and it'll be annoying to have to remember which one happens >> automatically and which one needs an override. > > Yeah, that was my first reaction to that example as well. If it > weren't so nearly fifty-fifty then we might have more traction there; > but it's pretty close, and actually the foo_in cases outnumber fooin > if I counted correctly. (Array entries should be ignored for this > purpose; maybe we'll autogenerate them someday.) Hmm, that wouldn't be too hard. Add a new metadata field called 'array_type_oid', then if it finds such an OID, teach genbki.pl to construct a tuple for the corresponding array type by consulting something like chartypcategoryBKI_ARRAY(A); chartypstorage BKI_ARRAY(x); ...etc in the header file, plus copying typalign from the element type. I'll whip this up sometime and add it to the next commitfest. On 4/26/18, Mark Dilger wrote: > /* cast method */ > - charcastmethod; > + charcastmethod BKI_DEFAULT("/${castfunc} == 0 ? 'b' : > 'f'/e"); > } FormData_pg_cast; I don't have a strong opinion about your simple substitution mechanism, but I find the above a bit harder to reason about. It also has the added disadvantage that there is whitespace in the BKI annotation, so it can no longer be parsed with the current setup. That could be overcome with additional complexity, of course, but then you're back in the position of advocating that for a single use case. -John Naylor From 0bec30078080730a8f1adf0e54060d60d6fc3015 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sat, 28 Apr 2018 15:54:14 +0700 Subject: [PATCH 1/1] Clarify special-case bootstrap values. Take special-case computation of out of the loop over all columns. This makes the code a bit more clear. It doesn't make much of a difference now, but if many columns are computed, it becomes difficult to read. Also add more comments. --- src/backend/catalog/Catalog.pm | 22 ++ src/include/catalog/pg_proc.dat | 3 +++ src/include/catalog/reformat_dat_file.pl | 13 ++--- 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 67d1719..a6f555c 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -292,6 +292,20 @@ sub AddDefaultValues my ($row, $schema, $catname) = @_; my @missing_fields; + # Compute special-case field values. + # Note: If you add new cases here, you must also teach + # strip_default_values() in include/catalog/reformat_dat_file.pl + # to delete them. + if ($catname eq 'pg_proc') + { + # pg_proc.pronargs can be derived from proargtypes. + if (defined $row->{proargtypes}) + { + my @proargtypes = split /\s+/, $row->{proargtypes}; + $row->{pronargs} = scalar(@proargtypes); + } + } + foreach my $column (@$schema) { my $attname = $column->{name}; @@ -305,14 +319,6 @@ sub AddDefaultValues { $row->{$attname} = $column->{default}; } - elsif ($catname eq 'pg_proc' - && $attname eq 'pronargs' - && defined($row->{proargtypes})) - { - # pg_proc.pronargs can be derived from proargtypes. - my @proargtypes = split /\s+/, $row->{proargtypes}; - $row->{$attname} = scalar(@proargtypes); - } else { # Failed to find a value. diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index f643f56..be37fb0 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -19,6 +19,9 @@ # duplicate the operator's comment.) initdb will supply suitable default # comments for functions referenced by pg_operator. +# Note: pronargs is computed when this file is read, so does not need to be +# specified here. See AddDefaultValues() in Catalog.pm. + # Try to follow the style of existing functions' comments. # Some recommended conventions: diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl index 8ebbec6..c807d3a 100644 --- a/src/include/catalog/reformat_dat_file.pl +++ b/src/include/catalog/reformat_dat_file.pl @@ -194,14 +194,13 @@ sub strip_default_values { delete $row->{$attname}; } + } - # Also delete pg_proc.pronargs, since that can be recomputed. - if ( $catname eq 'pg_proc' - && $attname eq 'pronargs' - && defined($row->{proargtypes})) - { - delete $row->{$attname}; - } + # Delete computed values. See AddDefaultValues() in Catal
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 4/26/18, Tom Lane wrote: >> (The prosrc-from-proname case, in isolation, could be handled about >> as well by adding a hardwired rule like we have for pronargs.) > If we think we might go in the direction of more special-case > behavior, the attached patch more fully documents what we already do, > and does some minor refactoring to make future additions more > straightforward. I think this would even be good to do for v11. Agreed, pushed. regards, tom lane
Re: WIP: a way forward on bootstrap data
Hackers, Have you already considered and rejected the idea of having genbki.pl/Catalog.pm define constants that can be used in the catalog .dat files? I'm mostly curious if people think the resulting .dat files are better or worse using constants of this sort. For example: diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index 7497d9cd9f..58ce24adf0 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -250,6 +250,21 @@ sub ParseData if ($lcnt == $rcnt) { + # pg_cast constants for castcontext + use constant IMPLICIT => 'i'; + use constant ASSIGNMENT => 'a'; + use constant EXPLICIT => 'e'; + + # pg_cast constants for castmethod + use constant FUNCTION => 'f'; + use constant BINARY => 'b'; + use constant INOUT => 'i'; + + # pg_proc constants for provolatile + use constant IMMUTABLE => 'i'; + use constant STABLE => 's'; + use constant VOLATILE => 'v'; + eval '$hash_ref = ' . $_; if (!ref $hash_ref) { diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat index cf007528fd..a4ceceb652 100644 --- a/src/include/catalog/pg_cast.dat +++ b/src/include/catalog/pg_cast.dat @@ -19,79 +19,79 @@ # int2->int4->int8->numeric->float4->float8, while casts in the # reverse direction are assignment-only. { castsource => 'int8', casttarget => 'int2', castfunc => 'int2(int8)', - castcontext => 'a', castmethod => 'f' }, + castcontext => ASSIGNMENT, castmethod => FUNCTION }, { castsource => 'int8', casttarget => 'int4', castfunc => 'int4(int8)', - castcontext => 'a', castmethod => 'f' }, + castcontext => ASSIGNMENT, castmethod => FUNCTION }, { castsource => 'int8', casttarget => 'float4', castfunc => 'float4(int8)',
Re: WIP: a way forward on bootstrap data
Mark Dilger writes: > Have you already considered and rejected the idea of having > genbki.pl/Catalog.pm define constants that can be used in > the catalog .dat files? I'm mostly curious if people think > the resulting .dat files are better or worse using constants > of this sort. For example: Hm, I don't think it's been debated in exactly these terms; but I find myself a bit skeptical of the proposal as-is. At least going by your examples, it'd make the .dat files a good bit more verbose, which doesn't seem like what we want. Also it creates a bigger impedance mismatch between what you see in the .dat files and what you see in the actual catalogs, inviting confusion. Also, with the particular implementation technique you suggest here, there'd be a single namespace for the constants across all catalogs, which creates a lot of possibilities for conflicts and mistakes. We could fix that by instituting some sort of unique-ifying naming convention, say CAST_IMPLICIT not just IMPLICIT, but that makes the verbosity problem worse. I think if we wanted to have something along this line, my preferred approach would be to continue to write the entries as string literals: castcontext => 'ASSIGNMENT', castmethod => 'FUNCTION' }, and make it the responsibility of genbki.pl to convert to the values recognized by the backend. That way the conversion could happen on a per-column basis, eliminating the conflict issue. But I'm not really convinced that this'd be an improvement over where we are now. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 5/7/18, Mark Dilger wrote: > Hackers, > > Have you already considered and rejected the idea of having > genbki.pl/Catalog.pm define constants that can be used in > the catalog .dat files? I'm mostly curious if people think > the resulting .dat files are better or worse using constants > of this sort. For example: ... > + # pg_cast constants for castcontext > + use constant IMPLICIT => 'i'; > + use constant ASSIGNMENT => 'a'; > + use constant EXPLICIT => 'e'; The comment refers to pg_cast, but these constants apply globally. It's also not the right place from a maintainability perspective, and if it was, now these values have different macros defined in two places. This is not good. > - castcontext => 'a', castmethod => 'f' }, > + castcontext => ASSIGNMENT, castmethod => FUNCTION }, For one, this breaks convention that the values are always single-quoted. If you had a use case for something like this, I would instead use the existing lookup infrastructure and teach genbki.pl to parse the enums (or #defines as the case might be) in the relevant header file. You'd need some improvement in readability to justify that additional code, though. I don't think this example quite passes (it's pretty obvious locally what the letters refer to), but others may feel differently. -John Naylor
Re: WIP: a way forward on bootstrap data
> On May 6, 2018, at 12:08 PM, John Naylor wrote: > > On 5/7/18, Mark Dilger wrote: >> Hackers, >> >> Have you already considered and rejected the idea of having >> genbki.pl/Catalog.pm define constants that can be used in >> the catalog .dat files? I'm mostly curious if people think >> the resulting .dat files are better or worse using constants >> of this sort. For example: > ... >> + # pg_cast constants for castcontext >> + use constant IMPLICIT => 'i'; >> + use constant ASSIGNMENT => 'a'; >> + use constant EXPLICIT => 'e'; > > The comment refers to pg_cast, but these constants apply globally. > It's also not the right place from a maintainability perspective, and > if it was, now these values have different macros defined in two > places. This is not good. > >> - castcontext => 'a', castmethod => 'f' }, >> + castcontext => ASSIGNMENT, castmethod => FUNCTION }, > > For one, this breaks convention that the values are always > single-quoted. If you had a use case for something like this, I would > instead use the existing lookup infrastructure and teach genbki.pl to > parse the enums (or #defines as the case might be) in the relevant > header file. You'd need some improvement in readability to justify > that additional code, though. I don't think this example quite passes > (it's pretty obvious locally what the letters refer to), but others > may feel differently. John, Tom, thanks for the feedback. I share your concerns about my straw-man proposal. But In the catalogs, 'f' usually means 'false', not 'function'. A person reading pg_cast.dat could see: castmethod => 'f' and think that meant a binary conversion, since castmethod is false. That's almost exactly wrong. Hence my desire to write castmethod => FUNCTION I don't have any better proposal, though. mark
Re: WIP: a way forward on bootstrap data
Pushed 0001 with some changes of my own. I'm afraid I created a few conflicts for the later patches in your series; please rebase. I don't think we introduced anything that would fail on old Perls, but let's see what buildfarm has to say. Others: Now is the time to raise concerns related to the proposed file formats and tooling, so please do have a look when you have a moment. At this stage, the proposed data format seems a good choice to me. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
Alvaro Herrera writes: > Others: Now is the time to raise concerns related to the proposed file > formats and tooling, so please do have a look when you have a moment. > At this stage, the proposed data format seems a good choice to me. It's not very clear to me what the proposed data format actually is, and I don't really want to read several hundred KB worth of patches in order to reverse-engineer that information. Nor do I see anything in the patch list that obviously looks like it updates doc/src/sgml/bki.sgml to explain things. So could we have an explanation of what it is we're agreeing to? regards, tom lane
Re: WIP: a way forward on bootstrap data
On Fri, Jan 12, 2018 at 11:38:54AM -0500, Tom Lane wrote: > Alvaro Herrera writes: > > Others: Now is the time to raise concerns related to the proposed > > file formats and tooling, so please do have a look when you have a > > moment. At this stage, the proposed data format seems a good > > choice to me. > > It's not very clear to me what the proposed data format actually is, > and I don't really want to read several hundred KB worth of patches > in order to reverse-engineer that information. Nor do I see > anything in the patch list that obviously looks like it updates > doc/src/sgml/bki.sgml to explain things. > > So could we have an explanation of what it is we're agreeing to? That would be awesome. A walk-through example or two would also help. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WIP: a way forward on bootstrap data
Tom Lane wrote: > Alvaro Herrera writes: > > Others: Now is the time to raise concerns related to the proposed file > > formats and tooling, so please do have a look when you have a moment. > > At this stage, the proposed data format seems a good choice to me. > > It's not very clear to me what the proposed data format actually is, > and I don't really want to read several hundred KB worth of patches > in order to reverse-engineer that information. Nor do I see > anything in the patch list that obviously looks like it updates > doc/src/sgml/bki.sgml to explain things. > > So could we have an explanation of what it is we're agreeing to? Here's a small sample pg_proc entry: { oid => '2147', descr => 'number of input rows for which the input expression is not null', n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', s => 'aggregate_dummy' }, An pg_amop entry: { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => '<(int2,int2)', am => 'btree' }, Notes: 1. this is Perl data; it is read with 'eval' without any external modules. 2. the pg_proc entry has been compressed to two lines, to avoid content-free lines that would easily confuse git merge, but keep line length reasonable. 3. references to objects in other catalogs are by name, such as "int8" or "btree/integer_ops" rather than OID. 4. for each attribute, an abbreviation can be declared. In the pg_proc sample we have "n" which stands for proname, because we have this line: + NameDataproname BKI_ABBREV(n); I think John has gone overboard with some of these choices, but we can argue the specific choices once we decide that abbreviation is a good idea. (Prior discussion seems to suggest we already agreed on that.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
Alvaro Herrera writes: > Tom Lane wrote: >> So could we have an explanation of what it is we're agreeing to? > Here's a small sample pg_proc entry: > { oid => '2147', descr => 'number of input rows for which the input > expression is not null', > n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => > 'any', s => 'aggregate_dummy' }, > An pg_amop entry: > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => > '<(int2,int2)', am => 'btree' }, > Notes: > 1. this is Perl data; it is read with 'eval' without any external modules. Check. Where is it coming from --- I suppose we aren't going to try to store this in the existing .h files? What provisions will there be for comments? > 2. the pg_proc entry has been compressed to two lines, to avoid >content-free lines that would easily confuse git merge, but keep line >length reasonable. Seems like we would almost need a per-catalog convention on how to lay out the entries, or else we're going to end up (over time) with lots of cowboy coding leading to entries that look randomly different from the ones around them. > 3. references to objects in other catalogs are by name, such as "int8" >or "btree/integer_ops" rather than OID. +1 > 4. for each attribute, an abbreviation can be declared. In the >pg_proc sample we have "n" which stands for proname, because we have >this line: >+ NameDataproname BKI_ABBREV(n); I think single-letter abbreviations here are a pretty bad space vs readability tradeoff, particularly for wider catalogs where it risks ambiguity. The pg_amop sample you show looks noticeably more legible than the other one. Still, this is something we can debate on a case-by-case basis, it's not a defect in the mechanism. One other question is how we'll verify the conversion. Is there an expectation that the .bki file immediately after the conversion will be identical to immediately before? regards, tom lane
Re: WIP: a way forward on bootstrap data
Tom, everyone, It's getting late in my timezone, but I wanted to give a few quick answers. I'll follow up tomorrow. Thanks Alvaro for committing my refactoring of pg_attribute data creation. I think your modifications are sensible and I'll rebase soon. On 1/13/18, Tom Lane wrote: > It's not very clear to me what the proposed data format actually is, > and I don't really want to read several hundred KB worth of patches > in order to reverse-engineer that information. Nor do I see > anything in the patch list that obviously looks like it updates > doc/src/sgml/bki.sgml to explain things. Alvaro gave a good overview, so I'll just point out a few things. -Patches 0002 through 0007 represent a complete one-to-one migration of data entries. I didn't see much in bki.sgml specific to the current format, so my documentation changes are confined largely to the README, in patch 0005. -Patches 0008 and 0009 implement techniques to make the data lines shorter. My choices are certainly debatable. There is a brief addition to the README in patch 0008. The abbreviation technique was only used in three catalogs to demonstrate. -Patches 0010 and 0011 implement human-readable OID references. -Patches 0012 and 0013 are cosmetic, but invasive. > Seems like we would almost need a per-catalog convention on how to lay out > the entries, or else we're going to end up (over time) with lots of cowboy > coding leading to entries that look randomly different from the ones > around them. If I understand your concern correctly, the convention is enforced by a script (rewrite_dat.pl). At the very least this would be done at the same time as pg_indent and perltidy. To be sure, because of default values many entries will look randomly different from the ones around them regardless. I have a draft patch to load the source data into tables for viewing, but it's difficult to rebase, so I thought I'd offer that enhancement later. > One other question is how we'll verify the conversion. Is there an > expectation that the .bki file immediately after the conversion will > be identical to immediately before? Not identical. First, as part of the base migration, I stripped almost all double quotes from the data entries since the new Perl hash values are already single-quoted. (The exception is macros expanded by initdb.c) I made genbki.pl add quotes on output to match what bootscanner.l expects. Where a simple rule made it possible, it also matches the original .bki. The new .bki will only diff where the current data has superfluous quotes. (ie. "0", "sql"). Second, if the optional cosmetic patch 0013 is applied, the individual index and toast commands will be in a different order. > Check. Where is it coming from --- I suppose we aren't going to try to > store this in the existing .h files? What provisions will there be for > comments? Yes, they're in ".dat" files. Perl comments (#) on their own line are supported. I migrated all existing comments from the header files as part of the conversion. This is scripted, so I can rebase over new catalog entries that get committed. > I think single-letter abbreviations here are a pretty bad space vs > readability tradeoff, particularly for wider catalogs where it risks > ambiguity. Ironically, I got that one from you [1] ;-), but if you have a different opinion upon seeing concrete, explicit examples, I think that's to be expected. -- Now is probably a good time to disclose concerns of my own: 1. MSVC dependency tracking is certainly broken until such time as I can shave that yak and test. 2. Keeping the oid symbols with the data entries required some Makefile trickery to make them visible to .c files outside the backend (patch 0007). It builds fine, but the dependency tracking might have bugs. -- [1] https://www.postgresql.org/message-id/15697.1479161432%40sss.pgh.pa.us Thanks, John Naylor
Re: WIP: a way forward on bootstrap data
On 1/12/18 12:24, Alvaro Herrera wrote: > Here's a small sample pg_proc entry: > > { oid => '2147', descr => 'number of input rows for which the input > expression is not null', > n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => > 'any', s => 'aggregate_dummy' }, > > An pg_amop entry: > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => > '<(int2,int2)', am => 'btree' }, > > Notes: > 1. this is Perl data; it is read with 'eval' without any external modules. > 2. the pg_proc entry has been compressed to two lines, to avoid >content-free lines that would easily confuse git merge, but keep line >length reasonable. I don't think I like this. I know pg_proc.h is a pain to manage, but at least right now it's approachable programmatically. I recently proposed to patch to replace the columns proisagg and proiswindow with a combined column prokind. I could easily write a small Perl script to make that change in pg_proc.h, because the format is easy to parse and has one line per entry. With this new format, that approach would no longer work, and I don't know what would replace it. > 3. references to objects in other catalogs are by name, such as "int8" >or "btree/integer_ops" rather than OID. I think we could already do this by making more use of things like regtype and regproc. That should be an easy change to make. > 4. for each attribute, an abbreviation can be declared. In the >pg_proc sample we have "n" which stands for proname, because we have >this line: >+ NameDataproname BKI_ABBREV(n); I'm afraid a key value system would invite writing the attributes in random order and create a mess over time. But if we want to do it, I think we could also add it to the current BKI format. The same goes for defining default values for some columns. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
On Fri, Jan 12, 2018 at 04:22:26PM -0500, Peter Eisentraut wrote: > On 1/12/18 12:24, Alvaro Herrera wrote: > > Here's a small sample pg_proc entry: > > > > { oid => '2147', descr => 'number of input rows for which the input > > expression is not null', > > n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => > > 'any', s => 'aggregate_dummy' }, > > > > An pg_amop entry: > > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper > > => '<(int2,int2)', am => 'btree' }, > > > > Notes: > > 1. this is Perl data; it is read with 'eval' without any external modules. > > 2. the pg_proc entry has been compressed to two lines, to avoid > >content-free lines that would easily confuse git merge, but keep line > >length reasonable. > > I don't think I like this. I know pg_proc.h is a pain to manage, > but at least right now it's approachable programmatically. I > recently proposed to patch to replace the columns proisagg and > proiswindow with a combined column prokind. I could easily write a > small Perl script to make that change in pg_proc.h, because the > format is easy to parse and has one line per entry. With this new > format, that approach would no longer work, and I don't know what > would replace it. How about ingesting with Perl, manipulating there, and spitting back out as Perl data structures? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WIP: a way forward on bootstrap data
Peter Eisentraut wrote: > On 1/12/18 12:24, Alvaro Herrera wrote: > > Here's a small sample pg_proc entry: > > > > { oid => '2147', descr => 'number of input rows for which the input > > expression is not null', > > n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => > > 'any', s => 'aggregate_dummy' }, > > > > An pg_amop entry: > > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper > > => '<(int2,int2)', am => 'btree' }, > > > > Notes: > > 1. this is Perl data; it is read with 'eval' without any external modules. > > 2. the pg_proc entry has been compressed to two lines, to avoid > >content-free lines that would easily confuse git merge, but keep line > >length reasonable. > > I don't think I like this. I know pg_proc.h is a pain to manage, but at > least right now it's approachable programmatically. I recently proposed > to patch to replace the columns proisagg and proiswindow with a combined > column prokind. I could easily write a small Perl script to make that > change in pg_proc.h, because the format is easy to parse and has one > line per entry. With this new format, that approach would no longer > work, and I don't know what would replace it. The idea in my mind is that you'd write a Perl program to do such changes, yeah. If the code we supply contains enough helpers and a few samples, it should be reasonably simple for people that don't do much Perl. The patch series does contain a few helper programs to write the data files. I haven't looked in detail what can they do and what they cannot. > > 3. references to objects in other catalogs are by name, such as "int8" > >or "btree/integer_ops" rather than OID. > > I think we could already do this by making more use of things like > regtype and regproc. That should be an easy change to make. Well, that assumes we *like* the current format, which I think is not a given ... more the opposite. > > 4. for each attribute, an abbreviation can be declared. In the > >pg_proc sample we have "n" which stands for proname, because we have > >this line: > >+ NameDataproname BKI_ABBREV(n); > > I'm afraid a key value system would invite writing the attributes in > random order and create a mess over time. Yeah, I share this concern. But you could fix it if the Perl tooling to write these files had a hardcoded list to work with. Maybe we could put it in a declaration of sorts at the start of each data file. > But if we want to do it, I think we could also add it to the current BKI > format. The same goes for defining default values for some columns. As above -- do we really like our current format so much that we're satisfied with minor tweaks? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
On 2018-01-12 18:36:14 -0300, Alvaro Herrera wrote: > As above -- do we really like our current format so much that we're > satisfied with minor tweaks? A definite no from here. I think especially pg_proc desperately needs something key=value like to be understandable, and that very clearly seems to be something we can't do in the current format. - Andres
Re: WIP: a way forward on bootstrap data
Alvaro Herrera writes: > Peter Eisentraut wrote: >> I don't think I like this. I know pg_proc.h is a pain to manage, but at >> least right now it's approachable programmatically. I recently proposed >> to patch to replace the columns proisagg and proiswindow with a combined >> column prokind. I could easily write a small Perl script to make that >> change in pg_proc.h, because the format is easy to parse and has one >> line per entry. With this new format, that approach would no longer >> work, and I don't know what would replace it. > The idea in my mind is that you'd write a Perl program to do such > changes, yeah. If the code we supply contains enough helpers and a few > samples, it should be reasonably simple for people that don't do much > Perl. It would be good to see a sample --- for a concrete example, how about creating a Perl script to do the conversion Peter mentions? >>> 3. references to objects in other catalogs are by name, such as "int8" >>> or "btree/integer_ops" rather than OID. >> I think we could already do this by making more use of things like >> regtype and regproc. That should be an easy change to make. > Well, that assumes we *like* the current format, which I think is not a > given ... more the opposite. Note that we *can't* easily improve that given the current tooling, mainly because the bootstrap-time capabilities of regproc_in et al are so limited. We don't even have regxxx types for many of the other cross-reference columns like opclass references, and I don't think I want to build them because they'd also have bootstrapping issues. According to my understanding, part of what's going on here is that we're going to teach genbki.pl to parse these object references and convert them to hard-coded OIDs in the emitted BKI file. That seems good to me, but one thing we're going to need is a spec for how genbki.pl knows what to do. >> I'm afraid a key value system would invite writing the attributes in >> random order and create a mess over time. > Yeah, I share this concern. But you could fix it if the Perl tooling to > write these files had a hardcoded list to work with. Maybe we could put > it in a declaration of sorts at the start of each data file. This is more or less the same concern I stated upthread. But the impression I'm getting is that we expect these files to often be written out from a Perl script, so it's mostly a question of how we teach the Perl scripts to emit stylistically consistent data. Then we can use the Perl scripts as a kind of pgindent for this data. >> But if we want to do it, I think we could also add it to the current BKI >> format. The same goes for defining default values for some columns. > As above -- do we really like our current format so much that we're > satisfied with minor tweaks? I'm sure not. This will be a big change, without a doubt, but I think we'll end up in a better place. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 1/13/18, Peter Eisentraut wrote: > I'm afraid a key value system would invite writing the attributes in > random order and create a mess over time. A developer can certainly write them in random order, and it will still work. However, in patch 0002 I have a script to enforce a standard appearance. Of course, for it to work, you have to run it. I describe it, if rather tersely, in the README changes in patch 0008. Since several people have raised this concern, I will go into a bit more depth here. Perhaps I should reuse some of this language for the README to improve it. src/include/catalog/rewrite_dat.pl knows where to find the schema of each catalog, namely the pg_*.h header, accessed via ParseHeader() in Catalog.pm. It writes key/value pairs in the order found in the schema: { key_1 => 'value_1', key_2 => 'value_2', ..., key_n => 'value_n' } The script also has an array of four hard-coded metadata fields: oid, oid_symbol, descr, and shdescr. If any of these are present, they will go on their own line first, in the order given: { oid => , oid_symbol => 'FOO_OID', descr => 'comment on foo', key_1 => 'value_1', key_2 => 'value_2', ..., key_n => 'value_n' } > I don't think I like this. I know pg_proc.h is a pain to manage, but at > least right now it's approachable programmatically. I recently proposed > to patch to replace the columns proisagg and proiswindow with a combined > column prokind. I could easily write a small Perl script to make that > change in pg_proc.h, because the format is easy to parse and has one > line per entry. With this new format, that approach would no longer > work, and I don't know what would replace it. I've attached four diffs/patches to walk through how you would replace the columns proisagg and proiswindow with a combined column prokind. Patch 01: Add new prokind column to pg_proc.h, with a default of 'n'. In many cases, this is all you would have to do, as far as bootstrapping is concerned. Diff 02: This is a one-off script diffed against rewrite_dat.pl. In rewrite_dat.pl, I have a section with this comment, and this is where I put the one-off code: # Note: This is also a convenient place to do one-off # bulk-editing. (I haven't documented this with explicit examples, so I'll have to remedy that) You would run it like this: cd src/include/catalog perl -I ../../backend/catalog/ rewrite_dat_with_prokind.pl pg_proc.dat While reading pg_proc.dat, the default value for prokind is added automatically. We inspect proisagg and proiswindow, and change prokind accordingly. pg_proc.dat now has all three columns, prokind, proisagg, and proiswindow. Patch 03: Remove old columns from pg_proc.h Now we run the standard rewrite: perl -I ../../backend/catalog/ rewrite_dat.pl pg_proc.dat Any values not found in the schema will simply not be written to pg_proc.dat, so the old columns are now gone. The result is found in patch 04. -- Note: You could theoretically also load the source data into tables, do the updates with SQL, and dump back out again. I made some progress with this method, but it's not complete. I think the load and dump steps add too much complexity for most use cases, but it's a possibility. -John Naylor --- rewrite_dat.pl 2018-01-13 13:32:42.528046389 +0700 +++ rewrite_dat_with_prokind.pl 2018-01-13 16:07:49.124401213 +0700 @@ -132,6 +132,19 @@ # these operations in the order given. # Note: This is also a convenient place to do one-off # bulk-editing. + + # One-off to migrate to prokind + # Default has already been filled in by now, so now add other + # values as appropriate + if ($values{proisagg} eq 't') + { + $values{prokind} = 'a'; + } + elsif ($values{proiswindow} eq 't') + { + $values{prokind} = 'w'; + } + if (!$expand_tuples) { strip_default_values(\%values, $schema, $catname); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 7b77eea..0f2990f 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -60,6 +60,9 @@ CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO /* is it a window function? */ bool proiswindow BKI_DEFAULT(f); + /* kind of function: n = normal, a = agg, w = window */ + char prokind BKI_DEFAULT(n); + /* security definer */ bool prosecdef BKI_DEFAULT(f); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index be8f822..b76dd72 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -57,12 +57,6 @@ CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
Re: WIP: a way forward on bootstrap data
On 1/13/18, Tom Lane wrote: > According to my understanding, part of what's going on here is that > we're going to teach genbki.pl to parse these object references and > convert them to hard-coded OIDs in the emitted BKI file. That seems > good to me, but one thing we're going to need is a spec for how > genbki.pl knows what to do. I don't know if it qualifies as a spec, but here's my implementation: Use dummy type aliases in the header files: regam, regoper, regopf, and regtype These are #defined away in genbki.h: +/* + * Some columns of type Oid have human-readable entries that are + * resolved when creating postgres.bki. + * + */ +#define regam Oid +#define regoper Oid +#define regopf Oid +#define regtype Oid Likewise, in genbki.pl (and I just noticed a typo, s/names/types/): +# We use OID aliases to indicate when to do OID lookups, so column names +# have to be turned back into 'oid' before writing the CREATE command. +my %RENAME_REGOID = ( + regam => 'oid', + regoper => 'oid', + regopf => 'oid', + regtype => 'oid'); + When genbki.pl sees one of these type aliases, it consults the appropriate lookup table, exactly how we do now for regproc. One possibly dubious design point is that I declined to teach the pg_attribute logic about this, so doing lookups in tables with schema macros has to be done explicitly. There is only one case of this right now, and I noted the tradeoff: + # prorettype + # Note: We could handle this automatically by using the + # 'regtype' alias, but then we would have to teach + # morph_row_for_pgattr() to change the attribute type back to + # oid. Since we have to treat pg_proc differently anyway, + # just do the type lookup manually here. + my $rettypeoid = $regtypeoids{ $bki_values{prorettype}}; + $bki_values{prorettype} = $rettypeoid + if defined($rettypeoid); This is all in patch 0011. -John Naylor
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 1/12/18, Alvaro Herrera wrote: >> Pushed 0001 with some changes of my own. I'm afraid I created a few >> conflicts for the later patches in your series; please rebase. > Attached, rebased over 255f14183ac. I decided that I wasn't going to get answers about the things I cared about without looking through the patchset, so I've now done so, in a once-over-lightly fashion. Here's a quick summary of what it does, for those who may be equally confused, and then some comments (not really a review). The patch removes DATA and (SH)DESCR lines from all the catalog/pg_*.h files, as well as the #defines for OID-value macros, and puts that information into pg_*.dat files corresponding to the .h files, in a form that is easily readable and writable by Perl scripts. Comments associated with this info are also transferred to the .dat files, and the scripts that can rewrite the .dat files are able to preserve the comments. genbki.pl is able to generate postgres.bki and other initdb input files directly from the .dat files. It also creates a single header file src/include/catalog/oid_symbols.h that contains all of the OID-value macros that were formerly in the pg_*.h files. The patch gets rid of the need to write hard-wired OIDs when referencing operators, opfamilies, etc in the .dat files; now you can write their names instead. genbki.pl will look up the names and substitute numeric OIDs in the emitted postgres.bki file. There are also provisions to shorten the .dat files through the use of abbreviated field names, default values for fields, and some other less-general techniques. -- OK, now some comments: I'm not sure about the decision to move all the OID macros into one file; that seems like namespace pollution. It's especially odd that you did that but did not consolidate fmgroids.h with the macros for symbols from other catalogs. Now it's true that we need all those symbols to be distinct, since it needs to be possible for .c files to include any combination of pg_*.h headers, but I don't think it's an especially good idea that you have to include all of them or none. Even if we're willing to put up with that namespace pollution for backend code, there are clients that currently include pg_*.h headers to get some of those macros, and they're likely to be less happy about it. The design I'd kind of imagined was one generated file of #define's per pg_*.h file, not just one giant one. It would be really nice, also, if the attribute number macros (Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated. Manually renumbering those is one of the bigger pains in the rear when adding catalog columns. It was less of a pain than adjusting the DATA lines of course, so I never figured it was worth doing something about in isolation --- but with this infrastructure in place, that's manual work we shouldn't have to do anymore. Another thing that I'd sort of hoped might happen from this patchset is to cure the problem of keeping some catalog headers safe for client-side inclusion, because some clients want the OID value macros and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they currently have to #include those headers or else hard-code the values. We've worked around that to date with ad-hoc solutions like splitting function declarations out to pg_*_fn.h files, but I never liked that much. With the OID value macros being moved out to separate generated file(s), there's now a possibility that we could fix this once and for all by making client-side code include those file(s) not pg_type.h et al themselves. But we'd need a way to put the column-value macros into those files too; maybe that's too messy to make it practical. The .dat files need to have header comments that follow project conventions, in particular they need to contain copyright statements. Likewise for generated files. I've got zero faith that the .h files will hold still long enough for these patches to be reviewed and applied. The ones that touch significant amounts of data need to be explained as "run this script on the current data", rather than presented as static diffs. I'm not really thrilled by the single-purpose "magic" behaviors added in 0007, such as computing prosrc from proname. I think those will add more confusion than they're worth. In 0010, you relabel the types of some OID columns so that genbki.pl will know which lookup to apply to them. That's not such a problem for the relabelings that are just macros and genbki.pl converts back to type OID in the .bki file. But you also did things like s/Oid/regtype/, and that IS a problem because it will affect what client code sees in those catalog columns. We've discussed changing those columns to regfoo types in the past, and decided not to, because of the likelihood of breaking client queries. I do not think this patch gets to change that policy. So the way to identify the lookup rule needs to be independent of whether the c
Re: WIP: a way forward on bootstrap data
I'm 1000% on board with replacing oid constants with symbolic names that get substituted programmatically. However I wonder why we're bothering inventing a new syntax that doesn't actually do much more than present static tabular data. If things like magic proname->prosrc behaviour are not valuable then we're not getting much out of this perl-friendly syntax that a simpler more standard format wouldn't get us. So just as a straw man proposal What if we just replaced the data file with a csv file that could be maintained in a spreadsheet. It could easily be parsed by perl and we could even have perl scripts that load the records into memory and modify them. You could even imagine writing a postgres script that loaded the csv file into a temporary table, did complex SQL updates or other DML, then wrote it back out to a csv file.
Re: WIP: a way forward on bootstrap data
Greg Stark writes: > I'm 1000% on board with replacing oid constants with symbolic names > that get substituted programmatically. Yeah, that's almost an independent feature --- we could do that without any of this other stuff, if we wanted. > However I wonder why we're bothering inventing a new syntax that > doesn't actually do much more than present static tabular data. If > things like magic proname->prosrc behaviour are not valuable then > we're not getting much out of this perl-friendly syntax that a simpler > more standard format wouldn't get us. TBH, the thing that was really drawing my ire about that was that John was inventing random special rules and documenting them *noplace* except for the guts of some perl code. If I have to read perl code to find out what the catalog data means, I'm going to be bitching loudly. That could be done better --- one obvious idea is to add a comment to the relevant .h file, next to the field whose value will be implicitly calculated. > So just as a straw man proposal What if we just replaced the data > file with a csv file that could be maintained in a spreadsheet. Bleah --- that's no better than what we have today, just different. And "maintained in a spreadsheet" doesn't sound attractive to me; you'd almost certainly lose comments, for instance. regards, tom lane
Re: WIP: a way forward on bootstrap data
I wrote: > Another thing that I'd sort of hoped might happen from this patchset > is to cure the problem of keeping some catalog headers safe for > client-side inclusion, because some clients want the OID value macros > and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they > currently have to #include those headers or else hard-code the values. > We've worked around that to date with ad-hoc solutions like splitting > function declarations out to pg_*_fn.h files, but I never liked that > much. With the OID value macros being moved out to separate generated > file(s), there's now a possibility that we could fix this once and for all > by making client-side code include those file(s) not pg_type.h et al > themselves. But we'd need a way to put the column-value macros into > those files too; maybe that's too messy to make it practical. I had a thought about how to do that. It's clearly desirable that that sort of material remain in the manually-maintained pg_*.h files, because that's basically where you look to find out C-level details of what's in a particular catalog. However, that doesn't mean that that's where the compiler has to find it. Imagine that we write such sections of the catalog .h files like #ifdef EXPOSE_TO_CLIENT_CODE /* * ... comment here ... */ #define PROVOLATILE_IMMUTABLE 'i' /* never changes for given input */ #define PROVOLATILE_STABLE 's' /* does not change within a scan */ #define PROVOLATILE_VOLATILE'v' /* can change even within a scan */ #endif /* EXPOSE_TO_CLIENT_CODE */ Like CATALOG_VARLEN, the symbol EXPOSE_TO_CLIENT_CODE is never actually defined to the compiler. What it does is to instruct genbki.pl to copy the material up to the matching #endif into the generated file for this catalog. So, for each catalog header pg_foo.h, there would be a generated file, say pg_foo_d.h, containing: * Natts_ and Anum_ macros for pg_foo * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h * Any OID-value macros for entries in that catalog pg_foo.h would contain a #include for pg_foo_d.h, so that backend-side code would obtain all these values the same as it did before. But the new policy for client code would be to include pg_foo_d.h *not* pg_foo.h, and so we are freed of any worry about whether pg_foo.h has to be clean for clients to include. We could re-merge the various pg_foo_fn.h files back into the main files, if we wanted. The contents of EXPOSE_TO_CLIENT_CODE sections wouldn't necessarily have to be just macros --- they could be anything that's safe and useful for client code. But that's probably the main usage. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 1/14/18, Tom Lane wrote: > I'm not sure about the decision to move all the OID macros into > one file; that seems like namespace pollution. > The design I'd kind of imagined was one generated file of #define's > per pg_*.h file, not just one giant one. First, thanks for the comments. I will start incorporating them in a few days to give others the chance to offer theirs. I'm inclined to agree about namespace pollution. One stumbling block is the makefile changes to allow OID symbols to be visible to non-backend code. Assuming I took the correct approach for the single file case, adapting that to multiple files would require some rethinking. > It's especially > odd that you did that but did not consolidate fmgroids.h with > the macros for symbols from other catalogs. Point taken. > It would be really nice, also, if the attribute number macros > (Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated. > Manually renumbering those is one of the bigger pains in the rear > when adding catalog columns. It was less of a pain than adjusting > the DATA lines of course, so I never figured it was worth doing > something about in isolation --- but with this infrastructure in > place, that's manual work we shouldn't have to do anymore. Searching the archives for discussion of Anum_* constants [1], I prefer your one-time suggestion to use enums instead. I'd do that, and then have Catalog.pm throw an error if Natts_* doesn't match the number of columns. That's outside the scope of this patch, however. > Another thing that I'd sort of hoped might happen from this patchset > is to cure the problem of keeping some catalog headers safe for > client-side inclusion, because some clients want the OID value macros > and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they > currently have to #include those headers or else hard-code the values. > We've worked around that to date with ad-hoc solutions like splitting > function declarations out to pg_*_fn.h files, but I never liked that > much. With the OID value macros being moved out to separate generated > file(s), there's now a possibility that we could fix this once and for all > by making client-side code include those file(s) not pg_type.h et al > themselves. But we'd need a way to put the column-value macros into > those files too; maybe that's too messy to make it practical. To make sure I understand you correctly: Currently we have pg_type.h oid symbols, column symbols pg_type_fn.h function decls And assuming we go with one generated oid symbol file per header, my patch would end up with something like pg_type.h column symbols (#includes pg_type_sym.h) pg_type_fn.h function decls pg_type_sym.h oid symbols (generated) And you're saying you'd prefer pg_type.h function decls (#includes pg_type_sym.h) pg_type_sym.h oid symbols, column symbols (generated) I agree that it'd be messy to drive the generation of the column symbols. I'll think about it. What about pg_type.h function decls (#includes pg_type_sym.h) pg_type_sym.h column symbols (static, #includes pg_type_oids.h) pg_type_oids.h oid symbols (generated) It's complicated, but arguably no more so than finding someplace more distant to stick the column symbols and writing code to copy them. It'd be about than 20 *_sym.h headers and 10 *_oids.h headers. > The .dat files need to have header comments that follow project > conventions, in particular they need to contain copyright statements. > Likewise for generated files. Okay. > I've got zero faith that the .h files will hold still long enough > for these patches to be reviewed and applied. The ones that touch > significant amounts of data need to be explained as "run this script > on the current data", rather than presented as static diffs. I've already rebased over a catalog change and it was not much fun, so I'd be happy to do it this way. > I'm not really thrilled by the single-purpose "magic" behaviors added > in 0007, such as computing prosrc from proname. I think those will > add more confusion than they're worth. Okay. I still think generating pg_type OID symbols is worth doing, but I no longer think this is a good place to do it. > In 0010, you relabel the types of some OID columns so that genbki.pl > will know which lookup to apply to them. That's not such a problem for > the relabelings that are just macros and genbki.pl converts back to > type OID in the .bki file. But you also did things like s/Oid/regtype/, > and that IS a problem because it will affect what client code sees in > those catalog columns. We've discussed changing those columns to > regfoo types in the past, and decided not to, because of the likelihood > of breaking client queries. I do not think this patch gets to change > that policy. So the way to identify the lookup rule needs to be > independent of whether the column is declared as Oid or an Oid alias type. > Perhaps an explicit marker telli
Re: WIP: a way forward on bootstrap data
Tom Lane wrote: > I had a thought about how to do that. It's clearly desirable that that > sort of material remain in the manually-maintained pg_*.h files, because > that's basically where you look to find out C-level details of what's > in a particular catalog. However, that doesn't mean that that's where > the compiler has to find it. > > [ elided explanation of pg_foo_d.h and pg_foo.h ] Sounds good to me. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
On 1/14/18, Tom Lane wrote: > So, for each catalog header pg_foo.h, there would be a > generated file, say pg_foo_d.h, containing: > > * Natts_ and Anum_ macros for pg_foo > > * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h > > * Any OID-value macros for entries in that catalog I'm on board in principle, but I have some questions: How do we have the makefiles gracefully handle 62 generated headers which need to be visible outside the backend? Can I generalize the approach I took for the single OIDs file I had, or is that not even the right way to go? (In short, I used a new backend make target that was invoked in src/common/Makefile - the details are in patch v6-0006) If we move fmgr oid generation here as you suggested earlier, I imagine we don't want to create a lot of #include churn. My idea is to turn src/include/utils/fmgroids.h into a static file that just #includes catalog/pg_proc_d.h. Thoughts? And I'm curious, what is "_d" intended to convey? (While I'm thinking outloud, I'm beginning to think that these headers lie outside the scope of genbki.pl, and belong in a separate script.) -John Naylor
Re: WIP: a way forward on bootstrap data
John Naylor writes: > On 1/14/18, Tom Lane wrote: >> So, for each catalog header pg_foo.h, there would be a >> generated file, say pg_foo_d.h, containing: >> * Natts_ and Anum_ macros for pg_foo >> * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h >> * Any OID-value macros for entries in that catalog > I'm on board in principle, but I have some questions: > How do we have the makefiles gracefully handle 62 generated headers > which need to be visible outside the backend? There are other people around here who are better make wizards than I, but I'm sure this is soluble. A substitution like $(CATALOG_HEADERS:_d.h=.h) might get you started. (It looks like CATALOG_HEADERS would have to be separated out of POSTGRES_BKI_SRCS, but that's easy.) > If we move fmgr oid generation here as you suggested earlier, I > imagine we don't want to create a lot of #include churn. My idea is to > turn src/include/utils/fmgroids.h into a static file that just > #includes catalog/pg_proc_d.h. Thoughts? Yeah ... or vice versa. I don't know if touching the way fmgroids.h is built is worthwhile. Certainly, if we'd built all this to begin with we'd have unified pg_proc.h's OID macro handling with the other catalogs, but we didn't and that might not be worth changing. I'm not strongly convinced either way. > And I'm curious, what is "_d" intended to convey? I was thinking "#define" or "data". You could make as good a case for "_g" for "generated", or probably some other choices. I don't have a strong preference; but I didn't especially like your original suggestion of "_sym", because that seemed like it would invite confusion with possible actual names for catalogs. A one-letter suffix seems less likely to conflict with anything anybody would think was a good choice of catalog name. > (While I'm thinking outloud, I'm beginning to think that these headers > lie outside the scope of genbki.pl, and belong in a separate script.) Maybe, but the conditions for regenerating these files would be exactly the same as for the .bki file, no? So we might as well just have one script do both, rather than writing duplicative rules in the Makefiles and the MSVC scripts. regards, tom lane
Re: WIP: a way forward on bootstrap data
On 12/13/17 04:06, John Naylor wrote: > There doesn't seem to be any interest in bootstrap data at the moment, > but rather than give up just yet, I've added a couple features to make > a data migration more compelling: I took a brief look at your patches, and there appear to be a number of good cleanups in there at least. But could you please send patches in git format-patch format with commit messages, so we don't have to guess what each patch does? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
On Thu, Dec 14, 2017 at 05:59:12PM +0700, John Naylor wrote: > On 12/13/17, Peter Eisentraut wrote: > > On 12/13/17 04:06, John Naylor wrote: > >> There doesn't seem to be any interest in bootstrap data at the moment, > >> but rather than give up just yet, I've added a couple features to make > >> a data migration more compelling: > > > > I took a brief look at your patches, and there appear to be a number of > > good cleanups in there at least. But could you please send patches in > > git format-patch format with commit messages, so we don't have to guess > > what each patch does? > > Thanks for taking a look and for pointing me to git format-patch. > That's much nicer than trying to keep emails straight. I've attached a > new patchset. Thanks for your hard work on this. It'll really make developing this part of the code a lot more pleasant. Unfortunately, it no longer applies to master. Can we get a rebase? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WIP: a way forward on bootstrap data
Pushed 0001 and 0002. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
Hmm, patch 0008 removes data lines from the .h but leaves the dependent OID define lines around: #define BTREE_AM_OID 403 This is not good, because then the define depends on data that appears in a distant file. Another consideration is that the current system has the property that these OIDs are discoverable by a hacker by navigating to the containing .h file; and a missing symbol is easily fixable if they need to hardcode the OID for which there isn't a symbol yet. Maybe a generated .h file containing defines for OIDs from all catalogs is okay. Of course, not all symbols are to be listed -- we should have a special marker in the data lines for those that are. Maybe something like this { oid => '403', descr => 'b-tree index access method', amname => 'btree', amhandler => 'bthandler', amtype => 'i', cpp_symbol => 'BTREE_AM_OID' }, (where 'cpp_symbol' would be skipped by genbki explicitly). Any better ideas? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: a way forward on bootstrap data
On Thu, Dec 21, 2017 at 5:32 PM, Alvaro Herrera wrote: > Hmm, patch 0008 removes data lines from the .h but leaves the dependent > OID define lines around: Just a question here -- do we actually have consensus on doing the stuff that these patches do? Because I'm not sure we do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: a way forward on bootstrap data
On 12/22/17, Alvaro Herrera wrote: > Hmm, patch 0008 removes data lines from the .h but leaves the dependent > OID define lines around: > > #define BTREE_AM_OID 403 > > This is not good, because then the define depends on data that appears > in a distant file. I see what you mean. > Another consideration is that the current system has > the property that these OIDs are discoverable by a hacker by navigating > to the containing .h file; and a missing symbol is easily fixable if > they need to hardcode the OID for which there isn't a symbol yet. I'm not sure I follow you here. > Maybe a generated .h file containing defines for OIDs from all catalogs > is okay. Of course, not all symbols are to be listed -- we should have > a special marker in the data lines for those that are. Maybe something > like this > > { oid => '403', descr => 'b-tree index access method', > amname => 'btree', amhandler => 'bthandler', amtype => 'i', > cpp_symbol => 'BTREE_AM_OID' }, > > (where 'cpp_symbol' would be skipped by genbki explicitly). The last part makes sense and would be a fairly mechanical change. I'm not sure of the best way to include those generated symbols back in the code again, though. I think a single file might not be desirable because of namespace pollution. The alternative would be to have, say, pg_am.h include pg_am_sym.h. More complex but doable. Also, no need to skip non-data values explicitly. The code knows where to find the schema. :-) Thanks for pushing 1 and 2, BTW. -John Naylor
Re: WIP: a way forward on bootstrap data
Robert Haas wrote: > On Thu, Dec 21, 2017 at 5:32 PM, Alvaro Herrera > wrote: > > Hmm, patch 0008 removes data lines from the .h but leaves the dependent > > OID define lines around: > > Just a question here -- do we actually have consensus on doing the > stuff that these patches do? Because I'm not sure we do. My reading of the old threads (linked provided by John in his initial email in this thread) is that we have a consensus that we want to get rid of the old data representation, because it causes endless amount of pain. The proposed format seems to satisfy the constraints that we all discussed, namely 1. be easier to modify than the current format, 2. in particular, allow for default values on certain columns 3. not cause git merge problems because of too similar lines in every record 4. not require onerous Perl modules The one thing we seem to lack is a tool to edit the data files, as you suggested[1]. Stephen Frost mentioned[2] that we could do this by allowing the .data files be loaded in a database table, have the changes made via SQL, then have a way to create an updated .data file. Tom said[3] he didn't like that particular choice. So we already have Catalog.pm that (after these patches) knows how to load .data files; we could use that as a basis to enable easy oneliners to do whatever editing is needed. Also, the CPP symbols remaining in the pg_*.h that I commented yesterday was already mentioned[4] before. [1] https://www.postgresql.org/message-id/CA%2BTgmoa4%3D5oz7wSMcLNLh8h6cXzPoxxNJKckkdSQA%2BzpUG0%2B0A%40mail.gmail.com [2] https://www.postgresql.org/message-id/20150304150712.GV29780%40tamriel.snowman.net [3] https://www.postgresql.org/message-id/24766.1478821303%40sss.pgh.pa.us [4] https://www.postgresql.org/message-id/15697.1479161...@sss.pgh.pa.us -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services