On Wed, Jan 28, 2026 at 12:44:33AM -0500, Corey Huinker wrote: > The short answer overcaution. The longer answer is we ingest it because > pg_mcv_list() returns it ( > https://www.postgresql.org/docs/current/functions-statistics.html) and I > assumed that it must be necessary for some corner-case input, otherwise why > would pg_mcv_list() export it? In retrospect, its lack of presence in > pg_stats view should have been a clue that we could get by without it. I'm > happy to rip it out and submit another patchset.
It seems to me that this comes down to the text[] representation when we read this data from the catalogs, where we can just rely on NULL being in the value, and the official marker in this case: https://www.postgresql.org/docs/current/arrays.html#ARRAYS-INPUT That's what your code relies on already anyway when deconstructing the input received from the restore function. Doubling on that with an extra array in input is error-prone IMO, as the overread issue I've spotted today is proving. The other two 1-dimension arrays for freqs and base_freqs are of course needed, we cannot and should not rebuild them on-the-fly.. For the in-core initial computation, statext_mcv_build() relies on the data gathered in StatsBuildData during ANALYZE before building the set of MCV data, where my guess is that it is just more useful to store this state as-is, and it's been built based on the Datum lookups. Perhaps we have a couple of specific cases where checking for we want some NULL-ness knowledge? It would be less expensive than a full array deconstruction, for sure, especially if the MCVs are large text values. -- Michael
signature.asc
Description: PGP signature
