Re: 16: Collation versioning and dependency helpers

2022-12-06 Thread Jeff Davis
On Tue, 2022-12-06 at 10:53 -0800, Andres Freund wrote:
> Perhaps worth posting a new version? Or are the remaining patches
> abandoned in
> favor of the other threads?

Marked what is there as committed, and the remainder is abandoned in
favor of other threads.

Thanks,
Jeff Davis





Re: 16: Collation versioning and dependency helpers

2022-12-06 Thread Andres Freund
Hi,

On 2022-10-31 16:36:54 -0700, Jeff Davis wrote:
> Committed these two small changes.

FWIW, as it stands cfbot can't apply the remaining changes:
http://cfbot.cputube.org/patch_40_3977.log

Perhaps worth posting a new version? Or are the remaining patches abandoned in
favor of the other threads?

Greetings,

Andres Freund




Re: 16: Collation versioning and dependency helpers

2022-10-31 Thread Jeff Davis
On Sun, 2022-10-30 at 19:10 +1300, Thomas Munro wrote:
> FWIW we did this (plus a lot more) in the per-index version tracking
> feature reverted from 14.

Thank you. I will catch up on that patch/thread.

> >   0002: Enable pg_collation_actual_version() to work on the default
> 
> Makes sense.
> 
> >   0003: Fix ALTER COLLATION "default" REFRESH VERSION, which 
> 
> Makes sense.

Committed these two small changes.

> >   0004: Add system views:
> >     pg_collation_versions: quickly see the current (from the
> > catalog)
> > and actual (from the provider) versions of each collation
> >     pg_collation_dependencies: map of objects to the collations
> > they
> > depend on
> > 
> > Along with these patches, you can use some tricks to verify data,
> > such
> > as /contrib/amcheck; or fix the data with things like:
> > 
> >   * REINDEX
> >   * VACUUM FULL/TRUNCATE/CLUSTER
> >   * REFRESH MATERIALIZED VIEW
> > 
> > And then refresh the collation version when you're confident that
> > your
> > data is valid.
> 
> Here you run into an argument that we had many times in that cycle:
> what's the point of views that suffer both false positives and false
> negatives?

The pg_collation_versions view is just a convenience, useful because
the default collation isn't represented normally in pg_collation so it
needs to be special-cased.

I could see how it would be tricky to precisely track the dependencies
through composite types (that is, create the proper pg_depend records),
but to just provide a view of the affected-by relationship seems more
doable. I'll review the previous discussion and see what I come up
with.

Of course, the view will just show an "affected by" relationship, it
won't show which objects are actually in violation of the current
collation version. But it at least gives the administrator a starting
place.

Regards,
Jeff Davis





Re: 16: Collation versioning and dependency helpers

2022-10-30 Thread Thomas Munro
On Sun, Oct 30, 2022 at 5:41 PM Jeff Davis  wrote:
> We haven't fully solved the changing collation-provider problem. An
> upgrade of the OS may change the version of libc or icu, and that might
> affect the collation, which could leave you with various corrupt
> database objects including:
>
>   * indexes
>   * constraints
>   * range types or multiranges (or other types dependent
> on collation for internal consistency)
>   * materialized views
>   * partitioned tables (range or hash)

Check.

> There's discussion about trying to reliably detect these changes and
> remedy them. But there are major challenges; for instance, glibc
> doesn't give a reliable signal that a collation may have changed, which
> would leave us with a lot of false positives and create a new set of
> problems (e.g. reindexing when it's unnecessary). And even with ICU, we
> don't have a way to support multiple versions of a provider or of a
> single collation, so trying to upgrade would still be a hassle.

FWIW some experimental code for multi-version ICU is proposed for
discussion here:

https://commitfest.postgresql.org/40/3956/

> Proposal:
>
> Add in some tools to make it easier for administrators to find out if
> they are at risk and solve the problem for themselves in a systematic
> way.

Excellent goal.

> Patches:
>
>   0001: Treat "default" collation as unpinned, so that entries in
> pg_depend are created. The rationale is that, since the "default"
> collation can change, it's not really an immutable system object, and
> it's worth tracking which objects are affected by it. It seems to bloat
> pg_depend by about 5-10% though -- that doesn't seem great, but I'm not
> sure if it's a real problem or not.

FWIW we did this (plus a lot more) in the per-index version tracking
feature reverted from 14.

>   0002: Enable pg_collation_actual_version() to work on the default
> collation (oid=100) so that it doesn't need to be treated as a special
> case.

Makes sense.

>   0003: Fix ALTER COLLATION "default" REFRESH VERSION, which currently
> throws an unhelpful internal error. Instead, issue a more helpful error
> that suggests "ALTER DATABASE ... REFRESH COLLATION VERSION" instead.

Makes sense.

>   0004: Add system views:
> pg_collation_versions: quickly see the current (from the catalog)
> and actual (from the provider) versions of each collation
> pg_collation_dependencies: map of objects to the collations they
> depend on
>
> Along with these patches, you can use some tricks to verify data, such
> as /contrib/amcheck; or fix the data with things like:
>
>   * REINDEX
>   * VACUUM FULL/TRUNCATE/CLUSTER
>   * REFRESH MATERIALIZED VIEW
>
> And then refresh the collation version when you're confident that your
> data is valid.

Here you run into an argument that we had many times in that cycle:
what's the point of views that suffer both false positives and false
negatives?

> TODO:

>   * Consider better tracking of which collation versions were active on
> a particular object since the last REINDEX (or REFRESH MATERIALIZED
> VIEW, TRUNCATE, or other command that would remove any trace of data
> affected by the previous collation version).

Right, the per-object dependency tracking feature, reverted from 14,
aimed to do exactly that.  It fell down on (1) some specific bugs that
were hard to fix, like dependencies inherited via composite types when
you change the composite type, and (2) doubt expressed by Tom, and
earlier Stephen, that pg_depend was a good place to store version
information.