Greetings, On Fri, Mar 29, 2024 at 19:35 Jeff Davis <pg...@j-davis.com> wrote:
> On Fri, 2024-03-29 at 18:02 -0400, Stephen Frost wrote: > > I’d certainly think “with stats” would be the preferred default of > > our users. > > I'm concerned there could still be paths that lead to an error. For > pg_restore, or when loading a SQL file, a single error isn't fatal > (unless -e is specified), but it still could be somewhat scary to see > errors during a reload. I understand that point. Also, it's new behavior, so it may cause some minor surprises, or there > might be minor interactions to work out. For instance, dumping stats > doesn't make a lot of sense if pg_upgrade (or something else) is just > going to run analyze anyway. But we don’t expect anything to run analyze … do we? So I’m not sure why it makes sense to raise this as a concern. What do you think about starting off with it as non-default, and then > switching it to default in 18? What’s different, given the above arguments, in making the change with 18 instead of now? I also suspect that if we say “we will change the default later” … that later won’t ever come and we will end up making our users always have to remember to say “with-stats” instead. The stats are important which is why the effort is being made in the first place. If just doing an analyze after loading the data was good enough then this wouldn’t be getting worked on. Independently, I had a thought around doing an analyze as the data is being loaded .. but we can’t do that for indexes (but we could perhaps analyze the indexed values as we build the index..). This works when we do a truncate or create the table in the same transaction, so we would tie into some of the existing logic that we have around that. Would also adjust COPY to accept an option that specifies the anticipated number of rows being loaded (which we can figure out during the dump phase reasonably..). Perhaps this would lead to a pg_dump option to do the data load as a transaction with a truncate before the copy (point here being to be able to still do parallel load while getting the benefits from knowing that we are completely reloading the table). Just some other thoughts- which I don’t intend to take away from the current effort at all, which I see as valuable and should be enabled by default. Thanks! Stephen >