Hello everyone, I was quite intrigued by a discussion that happened this past summer regarding generation of bootstrap files such as postgres.bki, and the associated pain points of maintaining the DATA() statements in catalog headers. It occurred to me that the current system is backwards: Instead of generating the bootstrap files from hard-coded strings contained in various header files, it seems it would be a cleaner design to generate both from a human-readable high-level description of the system catalogs.
1. You wouldn't need hand-maintained pg_foo.h files. The struct declarations, Natts/Annums, and such are predictable enough to be machine-generated. Ditto for all 6 Schema_pg_foo declarations needed by relcache.c. 2. You wouldn't even need to specify the contents of pg_attribute -- the bootstrap data for that are completely determined by the column names/types of the bootstrap tables and some type information in pg_type. 3. With a human-readable format, you could have placeholder strings representing oid values. 4. Since looking up catalog oids would be easy, we could get rid of postgres.description, postgres.shdescription by putting those statements into postgres.bki as well, which would eliminate the need for the setup_description() function in initdb.c with its hard-coded SQL. The patch linked to below implements 1-3 of what I've just described. Unfortunately, as will become apparent, it introduces a dependency on a Perl module which is inappropriate for Postgres' requirements of portability. So consider this a research project, not an item for the patch queue. My hope is that it could still be the basis for a practical solution. If this interests you, keep reading... FILE GENERATION There are 4 scripts, which output distprep targets: gen_bki.pl -- generates postgres.bki and schemapg.h gen_header.pl -- generates the pg_foo.h headers. gen_descr.pl -- generates postgres.description and postgres.shdescription gen_fmgr.pl -- generates fmgroids.h and fmgrtab.c CATALOG DATA FORMAT The catalog data is represented in the YAML format. The features that led to this decision are: 1. Ease of reading/editing. 2. Anchors/aliases -- enable human-readable oid values. 3. Merge key -- enables default values. A simple example using pg_tablespace will illustrate: pg_tablespace: relation_oid: 1213 relation_define: TableSpaceRelationId shared_relation: True columns: - spcname: name # tablespace name - spcowner: oid # owner of tablespace - spclocation: text # physical location (VAR LENGTH) - spcacl: aclitem[] # access permissions (VAR LENGTH) column_defaults: &pg_tablespace_defaults spcowner: *PGUID spclocation: '""' spcacl: _null_ data: - spcname: pg_default oid: &DEFAULTTABLESPACE_OID 1663 define: DEFAULTTABLESPACE_OID <<: *pg_tablespace_defaults - spcname: pg_global oid: 1664 define: GLOBALTABLESPACE_OID <<: *pg_tablespace_defaults When the YAML parser loads this into the Perl data structures used by the scripts, they look similar to this when output through Data::Dumper: $catalogs->{pg_tablespace} = { 'shared_relation' => 'True', 'relation_oid' => 1213, 'relation_define' => 'TableSpaceRelationId', 'columns' => [ { 'spcname' => 'name' }, { 'spcowner' => 'oid' }, { 'spclocation' => 'text' }, { 'spcacl' => 'aclitem[]' } ], 'data' => [ { 'spcname' => 'pg_default', 'spclocation' => '""', 'oid' => 1663, 'spcacl' => '_null_', 'define' => 'DEFAULTTABLESPACE_OID', 'spcowner' => 10 }, { 'spcname' => 'pg_global', 'spclocation' => '""', 'oid' => 1664, 'spcacl' => '_null_', 'define' => 'GLOBALTABLESPACE_OID', 'spcowner' => 10 } ], 'column_defaults' => { 'spclocation' => '""', 'spcacl' => '_null_', 'spcowner' => 10 } }; Note that the alias *PGUID is expanded to 10 since there was (not shown here) a corresponding anchor in pg_authid -- "oid: &PGUID 10". Similarly, in any subsequent catolog that refers to oid 1663 you can instead write *DEFAULTTABLESPACE_OID. Note also that each data entry is merged with the column_defaults hash. A portion of a more complex example will hopefully motivate this method of data organization: This is the first 5 entries of the current representation of pg_amop: /* default operators int2 */ DATA(insert ( 1976 21 21 1 95 403 )); DATA(insert ( 1976 21 21 2 522 403 )); DATA(insert ( 1976 21 21 3 94 403 )); DATA(insert ( 1976 21 21 4 524 403 )); DATA(insert ( 1976 21 21 5 520 403 )); The YAML representation leaves out half of the columns, to be computed at compile time by gen_bki.pl, since they are intentional denormalizations. The remaining columns are self-documenting because of human-readable oids: - amopfamily: *integer_btree_fam amopstrategy: 1 amopopr: *int2lt_op - amopfamily: *integer_btree_fam amopstrategy: 2 amopopr: *int2le_op - amopfamily: *integer_btree_fam amopstrategy: 3 amopopr: *int2eq_op - amopfamily: *integer_btree_fam amopstrategy: 4 amopopr: *int2ge_op - amopfamily: *integer_btree_fam amopstrategy: 5 amopopr: *int2gt_op I think this approach is more readable and less error-prone. DEPENDENCIES AND THE REAL WORLD Parsing YAML into Perl data structures requires the YAML::XS module, which in turn requires Perl 5.10. Since the generated files are distprep targets, this would only apply to those who want to build from the repo. Since this is still an unacceptable dependency, it might be worth it to use the new infrastructure with a simpler data format that can be parsed with straight Perl or C. NEW WARTS Some entries in catalog.yaml are only there to put macros into the generated header. These are indicated by data entries that contain only a "define:" value and a "nobki: True" value. If a catalog header used to contain things like function prototypes, enums, and #include's, these have been put into 9 new pg_foo_fn.h files which are #include'd into the generated pg_foo.h file. This is indicated by the presence of an "include:" value. The number of new files could be reduced by consolidation, but I didn't do that so that it would be obvious where the definitions come from. The old mechanism is retained for the declare index and declare toast statements. That is, they are still retrieved from indexing.h and toasting.h. using regular expressions. CAVEATS FOR THE CURIOUS 1. I haven't changed the configure script to test for YAML::XS. 2. I've run make -j2 successfully, but I'm not positive that my changes are 100% correct for parallel make. 3. I don't have ready access to a Windows box with the necessary development environment, so MSVC is certainly broken. 4. Since there are whitepace inconsistencies in the current headers, you need this command on the current postgres.bki to diff cleanly with mine: sed -i 's/_)$/_ )/' src/backend/catalog/postgres.bki INFO The project is located at http://git.postgresql.org/gitweb?p=users/jnaylor/bki.git;a=summary .gitignore | 141 + src/Makefile | 2 +- src/Makefile.global.in | 18 + src/backend/Makefile | 46 +- src/backend/catalog/Catalog.pm | 76 + src/backend/catalog/Makefile | 85 +- src/backend/catalog/README | 166 +- src/backend/catalog/catalog.yaml |28871 +++++++++++++++++++++++++ src/backend/catalog/gen_bki.pl | 493 + src/backend/catalog/gen_descr.pl | 72 + src/backend/catalog/gen_header.pl | 215 + src/backend/catalog/genbki.sh | 438 - src/backend/catalog/unused_oids.pl | 114 + src/backend/utils/Gen_fmgrtab.pl | 194 - src/backend/utils/Gen_fmgrtab.sh | 253 - src/backend/utils/Makefile | 14 +- src/backend/utils/cache/relcache.c | 1 + src/backend/utils/gen_fmgr.pl | 189 + src/include/Makefile | 2 +- src/include/catalog/duplicate_oids | 27 - src/include/catalog/genbki.h | 40 - src/include/catalog/indexing.h | 7 +- src/include/catalog/pg_aggregate.h | 239 - src/include/catalog/pg_aggregate_fn.h | 32 + src/include/catalog/pg_am.h | 125 - src/include/catalog/pg_amop.h | 681 - src/include/catalog/pg_amproc.h | 331 - src/include/catalog/pg_attrdef.h | 56 - src/include/catalog/pg_attribute.h | 516 - src/include/catalog/pg_auth_members.h | 56 - src/include/catalog/pg_authid.h | 97 - src/include/catalog/pg_cast.h | 355 - src/include/catalog/pg_cast_fn.h | 44 + src/include/catalog/pg_class.h | 144 - src/include/catalog/pg_constraint.h | 231 - src/include/catalog/pg_constraint_fn.h | 84 + src/include/catalog/pg_conversion.h | 77 - src/include/catalog/pg_database.h | 77 - src/include/catalog/pg_db_role_setting.h | 67 - src/include/catalog/pg_db_role_setting_fn.h | 29 + src/include/catalog/pg_default_acl.h | 75 - src/include/catalog/pg_depend.h | 90 - src/include/catalog/pg_description.h | 84 - src/include/catalog/pg_enum.h | 66 - src/include/catalog/pg_enum_fn.h | 25 + src/include/catalog/pg_foreign_data_wrapper.h | 62 - src/include/catalog/pg_foreign_server.h | 65 - src/include/catalog/pg_index.h | 91 - src/include/catalog/pg_inherits.h | 59 - src/include/catalog/pg_language.h | 79 - src/include/catalog/pg_largeobject.h | 58 - src/include/catalog/pg_largeobject_fn.h | 21 + src/include/catalog/pg_listener.h | 59 - src/include/catalog/pg_namespace.h | 82 - src/include/catalog/pg_namespace_fn.h | 22 + src/include/catalog/pg_opclass.h | 215 - src/include/catalog/pg_operator.h | 961 - src/include/catalog/pg_operator_fn.h | 31 + src/include/catalog/pg_opfamily.h | 141 - src/include/catalog/pg_pltemplate.h | 77 - src/include/catalog/pg_proc.h | 4749 ---- src/include/catalog/pg_rewrite.h | 69 - src/include/catalog/pg_shdepend.h | 90 - src/include/catalog/pg_shdescription.h | 75 - src/include/catalog/pg_statistic.h | 259 - src/include/catalog/pg_tablespace.h | 63 - src/include/catalog/pg_trigger.h | 113 - src/include/catalog/pg_trigger_fn.h | 42 + src/include/catalog/pg_ts_config.h | 60 - src/include/catalog/pg_ts_config_map.h | 78 - src/include/catalog/pg_ts_dict.h | 63 - src/include/catalog/pg_ts_parser.h | 67 - src/include/catalog/pg_ts_template.h | 67 - src/include/catalog/pg_type.h | 659 - src/include/catalog/pg_user_mapping.h | 59 - src/include/catalog/toasting.h | 4 +- src/include/catalog/unused_oids | 57 - 77 files changed, 30720 insertions(+), 12922 deletions(-) Some code snippets and conventions were borrowed from Robert Haas' earlier efforts. Feedback is appreciated. John -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers