Re: WIP: a way forward on bootstrap data

Tom Lane Sat, 13 Jan 2018 15:08:14 -0800

John Naylor <jcnay...@gmail.com> writes:
> On 1/12/18, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote:
>> Pushed 0001 with some changes of my own.  I'm afraid I created a few
>> conflicts for the later patches in your series; please rebase.


> Attached, rebased over 255f14183ac.

I decided that I wasn't going to get answers about the things I cared
about without looking through the patchset, so I've now done so,
in a once-over-lightly fashion.  Here's a quick summary of what it
does, for those who may be equally confused, and then some comments
(not really a review).

The patch removes DATA and (SH)DESCR lines from all the catalog/pg_*.h
files, as well as the #defines for OID-value macros, and puts that
information into pg_*.dat files corresponding to the .h files, in a form
that is easily readable and writable by Perl scripts.  Comments associated
with this info are also transferred to the .dat files, and the scripts
that can rewrite the .dat files are able to preserve the comments.

genbki.pl is able to generate postgres.bki and other initdb input files
directly from the .dat files.  It also creates a single header file
src/include/catalog/oid_symbols.h that contains all of the OID-value
macros that were formerly in the pg_*.h files.

The patch gets rid of the need to write hard-wired OIDs when referencing
operators, opfamilies, etc in the .dat files; now you can write their
names instead.  genbki.pl will look up the names and substitute numeric
OIDs in the emitted postgres.bki file.  There are also provisions to
shorten the .dat files through the use of abbreviated field names,
default values for fields, and some other less-general techniques.

--

OK, now some comments:

I'm not sure about the decision to move all the OID macros into
one file; that seems like namespace pollution.  It's especially
odd that you did that but did not consolidate fmgroids.h with
the macros for symbols from other catalogs.  Now it's true that
we need all those symbols to be distinct, since it needs to be
possible for .c files to include any combination of pg_*.h headers,
but I don't think it's an especially good idea that you have to
include all of them or none.  Even if we're willing to put up with
that namespace pollution for backend code, there are clients that
currently include pg_*.h headers to get some of those macros, and
they're likely to be less happy about it.

The design I'd kind of imagined was one generated file of #define's
per pg_*.h file, not just one giant one.

It would be really nice, also, if the attribute number macros
(Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated.
Manually renumbering those is one of the bigger pains in the rear
when adding catalog columns.  It was less of a pain than adjusting
the DATA lines of course, so I never figured it was worth doing
something about in isolation --- but with this infrastructure in
place, that's manual work we shouldn't have to do anymore.

Another thing that I'd sort of hoped might happen from this patchset
is to cure the problem of keeping some catalog headers safe for
client-side inclusion, because some clients want the OID value macros
and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
currently have to #include those headers or else hard-code the values.
We've worked around that to date with ad-hoc solutions like splitting
function declarations out to pg_*_fn.h files, but I never liked that
much.  With the OID value macros being moved out to separate generated
file(s), there's now a possibility that we could fix this once and for all
by making client-side code include those file(s) not pg_type.h et al
themselves.  But we'd need a way to put the column-value macros into
those files too; maybe that's too messy to make it practical.

The .dat files need to have header comments that follow project
conventions, in particular they need to contain copyright statements.
Likewise for generated files.

I've got zero faith that the .h files will hold still long enough
for these patches to be reviewed and applied.  The ones that touch
significant amounts of data need to be explained as "run this script
on the current data", rather than presented as static diffs.

I'm not really thrilled by the single-purpose "magic" behaviors added
in 0007, such as computing prosrc from proname.  I think those will
add more confusion than they're worth.

In 0010, you relabel the types of some OID columns so that genbki.pl
will know which lookup to apply to them.  That's not such a problem for
the relabelings that are just macros and genbki.pl converts back to
type OID in the .bki file.  But you also did things like s/Oid/regtype/,
and that IS a problem because it will affect what client code sees in
those catalog columns.  We've discussed changing those columns to
regfoo types in the past, and decided not to, because of the likelihood
of breaking client queries.  I do not think this patch gets to change
that policy.  So the way to identify the lookup rule needs to be
independent of whether the column is declared as Oid or an Oid alias type.
Perhaps an explicit marker telling what transformation to make, like

    Oid            rngsubtype BKI_LOOKUP(pg_type);

would work for that.

I'm not really on board at all with 0012, which AFAICS moves the indexing
and toast-table information out of indexing.h and toasting.h for no good
reason whatever.  We'll have quite enough code thrash and pending-patch
breakage from this patch set; we don't need to take on rearrangements that
aren't buying anything.

                        regards, tom lane

Re: WIP: a way forward on bootstrap data

Reply via email to