Hi, I took a quick look at the v4 patches. I haven't done much testing yet, so only some basic review.
0001 - The SGML docs for pg_import_rel_stats may need some changes. It starts with description of what gets overwritten (non-)transactionally (which gets repeated twice), but that seems more like an implementation detail. But it does not really say which pg_class fields get updated. Then it speculates about the possible use case (pg_upgrade). I think it'd be better to focus on the overall goal of updating statistics, explain what gets updated/how, and only then maybe mention the pg_upgrade use case. Also, it says "statistics are replaced" but it's quite clear if that applies only to matching statistics or if all stats are deleted first and then the new stuff is inserted. (FWIW remove_pg_statistics clearly deletes all pre-existing stats). - import_pg_statistics: I somewhat dislike that we're passing arguments as datum[] array - it's hard to say what the elements are expected to be, etc. Maybe we should expand this, to make it clear. How do we even know the array is large enough? - I don't quite understand why we need examine_rel_attribute. It sets a lot of fields in the VacAttrStats struct, but then we only use attrtypid and attrtypmod from it - so why bother and not simply load just these two fields? Or maybe I miss something. - examine_rel_attribute can return NULL, but get_attrinfo does not check for NULL and just dereferences the pointer. Surely that can lead to segfaults? - validate_no_duplicates and the other validate functions would deserve a better docs, explaining what exactly is checked (it took me a while to realize we check just for duplicates), what the parameters do etc. - Do we want to make the validate_ functions part of the public API? I realize we want to use them from multiple places (regular and extended stats), but maybe it'd be better to have an "internal" header file, just like we have extended_stats_internal? - I'm not sure we do "\set debug f" elsewhere. It took me a while to realize why the query outputs are empty ... 0002 - I'd rename create_stat_ext_entry to statext_create_entry. - Do we even want to include OIDs from the source server? Why not to just have object names and resolve those? Seems safer - if the target server has the OID allocated to a different object, that could lead to confusing / hard to detect issues. - What happens if we import statistics which includes data for extended statistics object which does not exist on the target machine? - pg_import_ext_stats seems to not use require_match_oids - bug? 0003 - no SGML docs for the new tools? - The help() seems to be wrong / copied from "clusterdb" or something like that, right? On 2/2/24 09:37, Corey Huinker wrote: > (hit send before attaching patches, reposting message as well) > > Attached is v4 of the statistics export/import patch. > > This version has been refactored to match the design feedback received > previously. > > The system views are gone. These were mostly there to serve as a baseline > for what an export query would look like. That role is temporarily > reassigned to pg_export_stats.c, but hopefully they will be integrated into > pg_dump in the next version. The regression test also contains the version > of each query suitable for the current server version. > OK > The export format is far closer to the raw format of pg_statistic and > pg_statistic_ext_data, respectively. This format involves exporting oid > values for types, collations, operators, and attributes - values which are > specific to the server they were created on. To make sense of those values, > a subset of the columns of pg_type, pg_attribute, pg_collation, and > pg_operator are exported as well, which allows pg_import_rel_stats() and > pg_import_ext_stats() to reconstitute the data structure as it existed on > the old server, and adapt it to the modern structure and local schema > objects. I have no opinion on the proposed format - still JSON, but closer to the original data. Works for me, but I wonder what Tom thinks about it, considering he suggested making it closer to the raw data. > > pg_import_rel_stats matches up local columns with the exported stats by > column name, not attnum. This allows for stats to be imported when columns > have been dropped, added, or reordered. > Makes sense. What will happen if we try to import data for extended statistics (or index) that does not exist on the target server? > pg_import_ext_stats can also handle column reordering, though it currently > would get confused by changes in expressions that maintain the same result > data type. I'm not yet brave enough to handle importing nodetrees, nor do I > think it's wise to try. I think we'd be better off validating that the > destination extended stats object is identical in structure, and to fail > the import of that one object if it isn't perfect. > Yeah, column reordering is something we probably need to handle. The stats order them by attnum, so if we want to allow import on a system where the attributes were dropped/created in a different way, this is necessary. I haven't tested this - is there a regression test for this? I agree expressions are hard. I don't think it's feasible to import nodetree from other server versions, but why don't we simply deparse the expression on the source, and either parse it on the target (and then compare the two nodetrees), or deparse the target too and compare the two deparsed expressions? I suspect the deparsing may produce slightly different results on the two versions (causing false mismatches), but perhaps the deparse on source + parse on target + compare nodetrees would work? Haven't tried, though. > Export formats go back to v10. > Do we even want/need to go beyond 12? All earlier versions are EOL. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company