Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Tue, Mar 16, 2021 at 2:32 AM Mark Dilger
 wrote:
>
> We do test corrupt relations.  We intentionally corrupt the pages within 
> corrupted heap tables to check that they get reported as corrupt.  (See 
> src/bin/pg_amcheck/t/004_verify_heapam.pl)

I disagree.  You're testing a modified version of the pages in OS
cache, which is very likely to be different from real world
corruption.  Those usually end up with a discrepancy between storage
and OS cache and this scenario isn't tested nor documented.




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 11:32 AM, Mark Dilger  
> wrote:
> 
> If you had a real, not fake, collation provider which actually provided a 
> collation with an actual version number, stopped the server, changed the 
> behavior of the collation as well as its version number, started the server, 
> and ran REINDEX (OUTDATED), I think that would be a more real-world test.  
> I'm not demanding that you write such a test.  I'm just saying that it is 
> strange that we don't have coverage for this anywhere, and was asking if you 
> think there is such coverage, because, you know, maybe I just didn't see 
> where that test was lurking.

I should add some context regarding why I mentioned this issue at all.

Not long ago, if an upgrade of icu or libc broke your collations, you were sad. 
 But postgres didn't claim to be competent to deal with this problem, so it was 
just a missing feature.  Now, with REINDEX (OUTDATED), we're really implying, 
if not outright saying, that postgres knows how to deal with collation 
upgrades.  I feel uncomfortable that v14 will make such a claim with not a 
single regression test confirming such a claim.  I'm happy to discover that 
such a test is lurking somewhere and I just didn't see it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 11:10 AM, Julien Rouhaud  wrote:
> 
> On Mon, Mar 15, 2021 at 10:56:50AM -0700, Mark Dilger wrote:
>> 
>> 
>>> On Mar 15, 2021, at 10:50 AM, Julien Rouhaud  wrote:
>>> 
>>> On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
 I'm saying that your patch seems to call down to 
 get_collation_actual_version() via get_collation_version_for_oid() from 
 your new function do_check_index_has_outdated_collation(), but I'm not 
 seeing how that gets exercised.
>>> 
>>> It's a little bit late here so sorry if I'm missing something.
>>> 
>>> do_check_index_has_outdated_collation() is called from
>>> index_has_outdated_collation() which is called from
>>> index_has_outdated_dependency() which is called from
>>> RelationGetIndexListFiltered(), and that function is called when passing the
>>> OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
>>> with added tests for both matching and non matching collation version.
>> 
>> Ok, fair enough.  I was thinking about the case where the collation actually 
>> returns a different version number because it (the C library providing the 
>> collation) got updated, but I think you've answered already that you are not 
>> planning to test that case, only the case where pg_depend is modified to 
>> have a bogus version number.
> 
> This infrastructure is supposed to detect that the collation library *used to*
> return a different version before it was updated.  And that's exactly what
> we're testing by manually updating the refobjversion.
> 
>> It seems a bit odd to me that a feature intended to handle cases where 
>> collations are updated is not tested via having a collation be updated 
>> during the test.  It leaves open the possibility that something differs 
>> between the test and reindexed run after real world collation updates.  But 
>> that's for the committer who picks up your patch to decide, and perhaps it 
>> is unfair to make your patch depend on addressing that issue.
> 
> Why is that odd?  We're testing that we're correctly storing the collation
> version during index creating and correctly detecting a mismatch.  Having a
> fake collation provider to return a fake version number won't add any more
> coverage unless I'm missing something.
> 
> It's similar to how we test the various corruption scenario.  AFAIK we're not
> providing custom drivers to write corrupted data but we're simply simulating a
> corruption overwriting some blocks.

We do test corrupt relations.  We intentionally corrupt the pages within 
corrupted heap tables to check that they get reported as corrupt.  (See 
src/bin/pg_amcheck/t/004_verify_heapam.pl)  Admittedly, the corruptions used in 
the tests are not necessarily representative of corruptions that might occur in 
the wild, but that is a hard problem to solve, since we don't know the 
statistical distribution of corruptions in the wild.

If you had a real, not fake, collation provider which actually provided a 
collation with an actual version number, stopped the server, changed the 
behavior of the collation as well as its version number, started the server, 
and ran REINDEX (OUTDATED), I think that would be a more real-world test.  I'm 
not demanding that you write such a test.  I'm just saying that it is strange 
that we don't have coverage for this anywhere, and was asking if you think 
there is such coverage, because, you know, maybe I just didn't see where that 
test was lurking.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 10:56:50AM -0700, Mark Dilger wrote:
> 
> 
> > On Mar 15, 2021, at 10:50 AM, Julien Rouhaud  wrote:
> > 
> > On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
> >> I'm saying that your patch seems to call down to 
> >> get_collation_actual_version() via get_collation_version_for_oid() from 
> >> your new function do_check_index_has_outdated_collation(), but I'm not 
> >> seeing how that gets exercised.
> > 
> > It's a little bit late here so sorry if I'm missing something.
> > 
> > do_check_index_has_outdated_collation() is called from
> > index_has_outdated_collation() which is called from
> > index_has_outdated_dependency() which is called from
> > RelationGetIndexListFiltered(), and that function is called when passing the
> > OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
> > with added tests for both matching and non matching collation version.
> 
> Ok, fair enough.  I was thinking about the case where the collation actually 
> returns a different version number because it (the C library providing the 
> collation) got updated, but I think you've answered already that you are not 
> planning to test that case, only the case where pg_depend is modified to have 
> a bogus version number.

This infrastructure is supposed to detect that the collation library *used to*
return a different version before it was updated.  And that's exactly what
we're testing by manually updating the refobjversion.

> It seems a bit odd to me that a feature intended to handle cases where 
> collations are updated is not tested via having a collation be updated during 
> the test.  It leaves open the possibility that something differs between the 
> test and reindexed run after real world collation updates.  But that's for 
> the committer who picks up your patch to decide, and perhaps it is unfair to 
> make your patch depend on addressing that issue.

Why is that odd?  We're testing that we're correctly storing the collation
version during index creating and correctly detecting a mismatch.  Having a
fake collation provider to return a fake version number won't add any more
coverage unless I'm missing something.

It's similar to how we test the various corruption scenario.  AFAIK we're not
providing custom drivers to write corrupted data but we're simply simulating a
corruption overwriting some blocks.




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 10:50 AM, Julien Rouhaud  wrote:
> 
> On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
>> I'm saying that your patch seems to call down to 
>> get_collation_actual_version() via get_collation_version_for_oid() from your 
>> new function do_check_index_has_outdated_collation(), but I'm not seeing how 
>> that gets exercised.
> 
> It's a little bit late here so sorry if I'm missing something.
> 
> do_check_index_has_outdated_collation() is called from
> index_has_outdated_collation() which is called from
> index_has_outdated_dependency() which is called from
> RelationGetIndexListFiltered(), and that function is called when passing the
> OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
> with added tests for both matching and non matching collation version.

Ok, fair enough.  I was thinking about the case where the collation actually 
returns a different version number because it (the C library providing the 
collation) got updated, but I think you've answered already that you are not 
planning to test that case, only the case where pg_depend is modified to have a 
bogus version number.

It seems a bit odd to me that a feature intended to handle cases where 
collations are updated is not tested via having a collation be updated during 
the test.  It leaves open the possibility that something differs between the 
test and reindexed run after real world collation updates.  But that's for the 
committer who picks up your patch to decide, and perhaps it is unfair to make 
your patch depend on addressing that issue.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:
> I'm saying that your patch seems to call down to 
> get_collation_actual_version() via get_collation_version_for_oid() from your 
> new function do_check_index_has_outdated_collation(), but I'm not seeing how 
> that gets exercised.

It's a little bit late here so sorry if I'm missing something.

do_check_index_has_outdated_collation() is called from
index_has_outdated_collation() which is called from
index_has_outdated_dependency() which is called from
RelationGetIndexListFiltered(), and that function is called when passing the
OUTDATED option to REINDEX (and reindexdb --outdated).  So this is exercised
with added tests for both matching and non matching collation version.




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 10:34 AM, Julien Rouhaud  wrote:
> 
> On Mon, Mar 15, 2021 at 10:13:55AM -0700, Mark Dilger wrote:
>> 
>> 
>>> On Mar 15, 2021, at 9:52 AM, Julien Rouhaud  wrote:
>>> 
>>> But there are also the tests in collate.icu.utf8.out which will fake 
>>> outdated
>>> collations (that's the original tests for the collation tracking patches) 
>>> and
>>> then check that outdated indexes are reindexed with both REINDEX and REINDEX
>>> (OUDATED).
>>> 
>>> So I think that all cases are covered.  Do you want to have more test cases?
>> 
>> I thought that just checked cases where a bogus 'not a version' was put into 
>> pg_catalog.pg_depend.  I'm talking about having a collation provider who 
>> returns a different version string and has genuinely different collation 
>> rules between versions, thereby breaking the index until it is updated.  Is 
>> that being tested?
> 
> No, we're only checking that the infrastructure works as intended.
> 
> Are you saying that you want to implement a simplistic collation provider with
> "tunable" ordering, so that you can actually check that an ordering change 
> will
> be detected as a corrupted index, as in you'll get some error or incorrect
> results?

I'm saying that your patch seems to call down to get_collation_actual_version() 
via get_collation_version_for_oid() from your new function 
do_check_index_has_outdated_collation(), but I'm not seeing how that gets 
exercised.  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 10:13:55AM -0700, Mark Dilger wrote:
> 
> 
> > On Mar 15, 2021, at 9:52 AM, Julien Rouhaud  wrote:
> > 
> > But there are also the tests in collate.icu.utf8.out which will fake 
> > outdated
> > collations (that's the original tests for the collation tracking patches) 
> > and
> > then check that outdated indexes are reindexed with both REINDEX and REINDEX
> > (OUDATED).
> > 
> > So I think that all cases are covered.  Do you want to have more test cases?
> 
> I thought that just checked cases where a bogus 'not a version' was put into 
> pg_catalog.pg_depend.  I'm talking about having a collation provider who 
> returns a different version string and has genuinely different collation 
> rules between versions, thereby breaking the index until it is updated.  Is 
> that being tested?

No, we're only checking that the infrastructure works as intended.

Are you saying that you want to implement a simplistic collation provider with
"tunable" ordering, so that you can actually check that an ordering change will
be detected as a corrupted index, as in you'll get some error or incorrect
results?

I don't think that this infrastructure is the right place to do that, and I'm
not sure what would be the benefit here.  If a library was updated, the
underlying indexes may or may not be corrupted, and we only warn about the
discrepancy with a low overhead.




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 9:52 AM, Julien Rouhaud  wrote:
> 
> But there are also the tests in collate.icu.utf8.out which will fake outdated
> collations (that's the original tests for the collation tracking patches) and
> then check that outdated indexes are reindexed with both REINDEX and REINDEX
> (OUDATED).
> 
> So I think that all cases are covered.  Do you want to have more test cases?

I thought that just checked cases where a bogus 'not a version' was put into 
pg_catalog.pg_depend.  I'm talking about having a collation provider who 
returns a different version string and has genuinely different collation rules 
between versions, thereby breaking the index until it is updated.  Is that 
being tested?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-15 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 09:30:43AM -0700, Mark Dilger wrote:
> 
> In the docs, 0001, "Fow now, the only dependency handled currently",
> 
> "Fow now" is misspelled, and "For now" seems redundant when used with 
> "currently".
> 
> 
> In the docs, 0002, "For now only dependency on collations are supported."
> 
> "dependency" is singular, "are" is conjugated for plural.
> 
> 
> In the docs, 0002, you forgot to update doc/src/sgml/ref/reindexdb.sgml with 
> the documentation for the --outdated switch.

Thanks, I'll fix those and do a full round a doc / comment proofreading.

> In the tests, you check that REINDEX (OUTDATED) doesn't do anything crazy, 
> but you are not really testing the functionality so far as I can see, as you 
> don't have any tests which cause the collation to be outdated.   Am I right 
> about that?  I wonder if you could modify DefineCollation.  In addition to 
> the providers "icu" and "libc" that it currently accepts, I wonder if it 
> might accept "test" or similar, and then you could create a test in 
> src/test/modules that compiles a "test" provider, creates a database with 
> indexes dependent on something from that provider, stops the database, 
> updates the test collation, ...?  


Indeed the tests in create_index.sql (and similarly in 090_reindexdb.pl) check
that REINDEX (OUTDATED) will ignore non outdated indexes as expected.

But there are also the tests in collate.icu.utf8.out which will fake outdated
collations (that's the original tests for the collation tracking patches) and
then check that outdated indexes are reindexed with both REINDEX and REINDEX
(OUDATED).

So I think that all cases are covered.  Do you want to have more test cases?




Re: REINDEX backend filtering

2021-03-15 Thread Mark Dilger



> On Mar 14, 2021, at 8:33 PM, Julien Rouhaud  wrote:
> 
> 


In the docs, 0001, "Fow now, the only dependency handled currently",

"Fow now" is misspelled, and "For now" seems redundant when used with 
"currently".


In the docs, 0002, "For now only dependency on collations are supported."

"dependency" is singular, "are" is conjugated for plural.


In the docs, 0002, you forgot to update doc/src/sgml/ref/reindexdb.sgml with 
the documentation for the --outdated switch.


In the tests, you check that REINDEX (OUTDATED) doesn't do anything crazy, but 
you are not really testing the functionality so far as I can see, as you don't 
have any tests which cause the collation to be outdated.   Am I right about 
that?  I wonder if you could modify DefineCollation.  In addition to the 
providers "icu" and "libc" that it currently accepts, I wonder if it might 
accept "test" or similar, and then you could create a test in src/test/modules 
that compiles a "test" provider, creates a database with indexes dependent on 
something from that provider, stops the database, updates the test collation, 
...?  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-14 Thread Julien Rouhaud
Please find attached v7, with the following changes:

- all typo reported by Michael and Mark are fixed
- REINDEX (OUTDATED) INDEX will now ignore the index if it doesn't have any
  outdated dependency.  Partitioned index are correctly handled.
- REINDEX (OUTDATED, VERBOSE) will now inform caller of ignored indexes, with
  lines of the form:

NOTICE:  index "index_name" has no outdated dependency

- updated regression tests to cover all those changes.  I kept the current
  approach of using simple SQL test listing the ignored indexes.  I also added
  some OUDATED option to collate.icu.utf8 tests so that we also check that both
  REINDEX and REINDEX(OUTDATED) work as expected.
- move pg_index_has_outdated_dependency to 0002

I didn't remove index_has_outdated_collation() for now.
>From 91bcd6e4565164314eb6444635ca274695de3748 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v7 1/2] Add a new OUTDATED filtering facility for REINDEX
 command.

OUTDATED is added a new unreserved keyword.

When used, REINDEX will only process indexes that have an outdated dependency.
For now, only dependency on collations are supported but we'll likely support
other kind of dependency in the future.

Author: Julien Rouhaud 
Reviewed-by: Michael Paquier, Mark Dilger
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/ref/reindex.sgml | 12 +++
 src/backend/access/index/indexam.c| 59 ++
 src/backend/catalog/index.c   | 79 ++-
 src/backend/commands/indexcmds.c  | 37 -
 src/backend/parser/gram.y |  4 +-
 src/backend/utils/cache/relcache.c| 47 +++
 src/bin/psql/tab-complete.c   |  2 +-
 src/include/access/genam.h|  1 +
 src/include/catalog/index.h   |  3 +
 src/include/parser/kwlist.h   |  1 +
 src/include/utils/relcache.h  |  1 +
 .../regress/expected/collate.icu.utf8.out | 12 ++-
 src/test/regress/expected/create_index.out| 27 +++
 src/test/regress/sql/collate.icu.utf8.sql | 12 ++-
 src/test/regress/sql/create_index.sql | 18 +
 15 files changed, 301 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index ff4dba8c36..c4749c338b 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -26,6 +26,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 where option can be one of:
 
 CONCURRENTLY [ boolean ]
+OUTDATED [ boolean ]
 TABLESPACE new_tablespace
 VERBOSE [ boolean ]
 
@@ -188,6 +189,17 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+OUTDATED
+
+ 
+  This option can be used to filter the list of indexes to rebuild and only
+  process indexes that have outdated dependencies.  Fow now, the only
+  dependency handled currently is the collation provider version.
+ 
+
+   
+

 TABLESPACE
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 3d2dbed708..dc1c85cf0d 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -145,6 +145,65 @@ index_open(Oid relationId, LOCKMODE lockmode)
 	return r;
 }
 
+/* 
+ *		try_index_open - open an index relation by relation OID
+ *
+ *		Same as index_open, except return NULL instead of failing
+ *		if the index does not exist.
+ * 
+ */
+Relation
+try_index_open(Oid relationId, LOCKMODE lockmode)
+{
+	Relation	r;
+
+	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
+
+	/* Get the lock first */
+	if (lockmode != NoLock)
+		LockRelationOid(relationId, lockmode);
+
+	/*
+	 * Now that we have the lock, probe to see if the relation really exists
+	 * or not.
+	 */
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relationId)))
+	{
+		/* Release useless lock */
+		if (lockmode != NoLock)
+			UnlockRelationOid(relationId, lockmode);
+
+		return NULL;
+	}
+
+	/* Should be safe to do a relcache load */
+	r = RelationIdGetRelation(relationId);
+
+	if (!RelationIsValid(r))
+		elog(ERROR, "could not open relation with OID %u", relationId);
+
+	/* If we didn't get the lock ourselves, assert that caller holds one */
+	Assert(lockmode != NoLock ||
+		   CheckRelationLockedByMe(r, AccessShareLock, true));
+
+	if (r->rd_rel->relkind != RELKIND_INDEX &&
+		r->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not an index",
+		RelationGetRelationName(r;
+	}
+
+	/* Make note that we've accessed a temporary relation */
+	if (RelationUsesLocalBuffers(r))
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
+
+	pgstat_initstats(r);
+
+	return r;
+}
+
 /* 
  *		index_close - close an index relation
  *
diff --git a/src/backend/catalog/index.c 

Re: REINDEX backend filtering

2021-03-14 Thread Julien Rouhaud
Hi Mark,

On Sun, Mar 14, 2021 at 05:01:20PM -0700, Mark Dilger wrote:
> 
> > On Mar 14, 2021, at 12:10 AM, Julien Rouhaud  wrote:
> 
> I'm coming to this patch quite late, perhaps too late to change design 
> decision in time for version 14.

Thanks for lookint at it!

> + if (outdated && PQserverVersion(conn) < 14)
> + {
> + PQfinish(conn);
> + pg_log_error("cannot use the \"%s\" option on server versions 
> older than PostgreSQL %s",
> +  "outdated", "14");
> + exit(1);
> + }
> 
> If detection of outdated indexes were performed entirely in the frontend 
> (reindexdb) rather than in the backend (reindex command), would reindexdb be 
> able to connect to older servers?  Looking quickly that the catalogs, it 
> appears pg_index, pg_depend, pg_collation and a call to the SQL function 
> pg_collation_actual_version() compared against pg_depend.refobjversion would 
> be enough to construct a list of indexes in need of reindexing.  Am I missing 
> something here?

There won't be any need to connect on older servers if the patch is committed
in this commitfest as refobjversion was also added in pg14.

> I understand that wouldn't help somebody wanting to reindex from psql.  Is 
> that the whole reason you went a different direction with this feature?

This was already discussed with Magnus and Michael.  The main reason for that
are:

- no need for a whole new infrastructure to be able to process a list of
  indexes in parallel which would be required if getting the list of indexes in
  the client

- if done in the backend, then the ability is immediately available for all
  user scripts, compared to the burden of writing the needed query (with the
  usual caveats like quoting, qualifying all objects if the search_path isn't
  safe and such) and looping though all the results.

> + printf(_("  --outdated   only process indexes having 
> outdated depencies\n"));  
> 
> typo.
> 
> + bool outdated;  /* depends on at least on deprected collation? */
> 
> typo.

Thanks! I'll fix those.




Re: REINDEX backend filtering

2021-03-14 Thread Julien Rouhaud
On Mon, Mar 15, 2021 at 08:56:00AM +0900, Michael Paquier wrote:
> On Sun, Mar 14, 2021 at 10:57:37PM +0800, Julien Rouhaud wrote:
> > 
> > Is there really a use case for reindexing a specific index and at the same 
> > time
> > asking for possibly ignoring it?  I think we should just forbid REINDEX
> > (OUTDATED) INDEX.  What do you think?
> 
> I think that there would be cases to be able to handle that, say if a
> user wants to works on a specific set of indexes one-by-one.

If a user want to work on a specific set of indexes one at a time, then the
list of indexes is probably already retrieved from some SQL query and there's
already all necessary infrastructure to properly filter the non oudated
indexes.

> There is
> also the argument of inconsistency with the other commands.

Yes, but the other commands dynamically construct a list of indexes.

The only use case I see would be to process a partitioned index if some of the
underlying indexes have already been processed.  IMO this is better addressed
by REINDEX TABLE.

Anyway I'll make REINDEX (OUTDATED) INDEX to maybe reindex the explicitely
stated index name since you think it's a better behavior.

> 
> > I was thinking that users would be more interested in the list of indexes 
> > being
> > processed rather than the full list of indexes and a mention of whether it 
> > was
> > processed or not.  I can change that if you prefer.
> 
> How invasive do you think it would be to add a note in the verbose
> output when indexes are skipped?

Probably not too invasive, but the verbose output is already inconsistent:

# reindex (verbose) table tt;
NOTICE:  0: table "tt" has no indexes to reindex

But a REINDEX (VERBOSE) DATABASE won't emit such message.  I'm assuming that
it's because it doesn't make sense to warn in that case as the user didn't
explicitly specified the table name.  We have the same behavior for now when
using the OUTDATED option if no indexes are processed.  Should that be changed
too?

> > Did you mean index_has_outdated_collation() and
> > index_has_outdated_dependency()?  It's just to keep things separated, mostly
> > for future improvements on that infrastructure.  I can get rid of that 
> > function
> > and put back the code in index_has_outadted_dependency() if that's overkill.
> 
> Yes, sorry.  I meant index_has_outdated_collation() and
> index_has_outdated_dependency().

And are you ok with this function?




Re: REINDEX backend filtering

2021-03-14 Thread Mark Dilger



> On Mar 14, 2021, at 12:10 AM, Julien Rouhaud  wrote:
> 
> v6 attached, rebase only due to conflict with recent commit.

Hi Julien,

I'm coming to this patch quite late, perhaps too late to change design decision 
in time for version 14.


+   if (outdated && PQserverVersion(conn) < 14)
+   {
+   PQfinish(conn);
+   pg_log_error("cannot use the \"%s\" option on server versions 
older than PostgreSQL %s",
+"outdated", "14");
+   exit(1);
+   }

If detection of outdated indexes were performed entirely in the frontend 
(reindexdb) rather than in the backend (reindex command), would reindexdb be 
able to connect to older servers?  Looking quickly that the catalogs, it 
appears pg_index, pg_depend, pg_collation and a call to the SQL function 
pg_collation_actual_version() compared against pg_depend.refobjversion would be 
enough to construct a list of indexes in need of reindexing.  Am I missing 
something here?

I understand that wouldn't help somebody wanting to reindex from psql.  Is that 
the whole reason you went a different direction with this feature?



+   printf(_("  --outdated   only process indexes having 
outdated depencies\n"));  

typo.



+   bool outdated;  /* depends on at least on deprected collation? */

typo.



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: REINDEX backend filtering

2021-03-14 Thread Michael Paquier
On Sun, Mar 14, 2021 at 10:57:37PM +0800, Julien Rouhaud wrote:
> On Sun, Mar 14, 2021 at 08:54:11PM +0900, Michael Paquier wrote:
>> In ReindexRelationConcurrently(), there is no filtering done for the
>> index themselves.  The operation is only done on the list of indexes
>> fetched from the parent relation.  Why?  This means that a REINDEX
>> (OUTDATED) INDEX would actually rebuild an index even if this index
>> has no out-of-date collations, like a catalog.  I think that's
>> confusing.
>> 
>> Same comment for the non-concurrent case, as of the code paths of
>> reindex_index().
> 
> Yes, I'm not sure what we should do in that case.  I thought I put a comment
> about that but it apparently disappeared during some rewrite.
> 
> Is there really a use case for reindexing a specific index and at the same 
> time
> asking for possibly ignoring it?  I think we should just forbid REINDEX
> (OUTDATED) INDEX.  What do you think?

I think that there would be cases to be able to handle that, say if a
user wants to works on a specific set of indexes one-by-one.  There is
also the argument of inconsistency with the other commands.

> I was thinking that users would be more interested in the list of indexes 
> being
> processed rather than the full list of indexes and a mention of whether it was
> processed or not.  I can change that if you prefer.

How invasive do you think it would be to add a note in the verbose
output when indexes are skipped?

> Did you mean index_has_outdated_collation() and
> index_has_outdated_dependency()?  It's just to keep things separated, mostly
> for future improvements on that infrastructure.  I can get rid of that 
> function
> and put back the code in index_has_outadted_dependency() if that's overkill.

Yes, sorry.  I meant index_has_outdated_collation() and
index_has_outdated_dependency().
--
Michael


signature.asc
Description: PGP signature


Re: REINDEX backend filtering

2021-03-14 Thread Julien Rouhaud
On Sun, Mar 14, 2021 at 08:54:11PM +0900, Michael Paquier wrote:
> On Sun, Mar 14, 2021 at 04:10:07PM +0800, Julien Rouhaud wrote:
> 
> +   booloutdated_filter = false;
> Wouldn't it be better to rename that "outdated" instead for
> consistency with the other options?

I agree.

> In ReindexRelationConcurrently(), there is no filtering done for the
> index themselves.  The operation is only done on the list of indexes
> fetched from the parent relation.  Why?  This means that a REINDEX
> (OUTDATED) INDEX would actually rebuild an index even if this index
> has no out-of-date collations, like a catalog.  I think that's
> confusing.
> 
> Same comment for the non-concurrent case, as of the code paths of
> reindex_index().

Yes, I'm not sure what we should do in that case.  I thought I put a comment
about that but it apparently disappeared during some rewrite.

Is there really a use case for reindexing a specific index and at the same time
asking for possibly ignoring it?  I think we should just forbid REINDEX
(OUTDATED) INDEX.  What do you think?

> Would it be better to inform the user which indexes are getting
> skipped in the verbose output if REINDEXOPT_VERBOSE is set?

I was thinking that users would be more interested in the list of indexes being
processed rather than the full list of indexes and a mention of whether it was
processed or not.  I can change that if you prefer.

> +   
> +Check if the specified index has any outdated dependency. For now 
> only
> +dependency on collations are supported.
> +   
> [...]
> +OUTDATED
> +
> + 
> +  This option can be used to filter the list of indexes to rebuild and 
> only
> +  process indexes that have outdated dependencies.  Fow now, the only
> +  handle dependency is for the collation provider version.
> + 
> Do we really need to be this specific in this part of the
> documentation with collations?

I think it's important to document what this option really does, and I don't
see a better place to document it.

> The last sentence of this paragraph
> sounds weird.  Don't you mean instead to write "the only dependency
> handled currently is the collation provider version"?

Indeed, I'll fix!

> +\set VERBOSITY terse \\ -- suppress machine-dependent details
> +-- no suitable index should be found
> +REINDEX (OUTDATED) TABLE reindex_coll;
> What are those details?

That just the same comment as the previous occurence in the file, I kept it for
consistency.

> And wouldn't it be more stable to just check
> after the relfilenode of the indexes instead?

Agreed, I'll add additional tests for that.

> " ORDER BY sum(ci.relpages)"
> Schema qualification here, twice.

Well, this isn't actually mandatory, per comment at the top:

/*
 * The queries here are using a safe search_path, so there's no need to
 * fully qualify everything.
 */

But I think it's a better style to fully qualify objects, so I'll fix.

> +   rel = try_relation_open(indexOid, AccessShareLock);
> +
> +   if (rel == NULL)
> +   PG_RETURN_NULL();
> Let's introduce a try_index_open() here.

Good idea!

> What's the point in having both index_has_outdated_collation() and
> index_has_outdated_collation()?

Did you mean index_has_outdated_collation() and
index_has_outadted_dependency()?  It's just to keep things separated, mostly
for future improvements on that infrastructure.  I can get rid of that function
and put back the code in index_has_outadted_dependency() if that's overkill.

> It seems to me that 0001 should be split into two patches:
> - One for the backend OUTDATED option.
> - One for pg_index_has_outdated_dependency(), which only makes sense
> in-core once reindexdb is introduced.

I thought it would be better to add the backend part in a single commit, and
then built the client part on top of that in a different commit.  I can
rearrange things if you want, but in that case should
index_has_outadted_dependency() be in a different patch as you mention or
simply merged with 0002 (the --oudated option for reindexdb)?




Re: REINDEX backend filtering

2021-03-14 Thread Michael Paquier
On Sun, Mar 14, 2021 at 04:10:07PM +0800, Julien Rouhaud wrote:
> v6 attached, rebase only due to conflict with recent commit.

I have read through the patch.

+   booloutdated_filter = false;
Wouldn't it be better to rename that "outdated" instead for
consistency with the other options?

In ReindexRelationConcurrently(), there is no filtering done for the
index themselves.  The operation is only done on the list of indexes
fetched from the parent relation.  Why?  This means that a REINDEX
(OUTDATED) INDEX would actually rebuild an index even if this index
has no out-of-date collations, like a catalog.  I think that's
confusing.

Same comment for the non-concurrent case, as of the code paths of
reindex_index().

Would it be better to inform the user which indexes are getting
skipped in the verbose output if REINDEXOPT_VERBOSE is set?

+   
+Check if the specified index has any outdated dependency. For now only
+dependency on collations are supported.
+   
[...]
+OUTDATED
+
+ 
+  This option can be used to filter the list of indexes to rebuild and only
+  process indexes that have outdated dependencies.  Fow now, the only
+  handle dependency is for the collation provider version.
+ 
Do we really need to be this specific in this part of the
documentation with collations?  The last sentence of this paragraph
sounds weird.  Don't you mean instead to write "the only dependency
handled currently is the collation provider version"?

+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
What are those details?  And wouldn't it be more stable to just check
after the relfilenode of the indexes instead?

" ORDER BY sum(ci.relpages)"
Schema qualification here, twice.

+   printf(_("  --outdated   only process indexes
having outdated depencies\n"));
s/depencies/dependencies/.

+   rel = try_relation_open(indexOid, AccessShareLock);
+
+   if (rel == NULL)
+   PG_RETURN_NULL();
Let's introduce a try_index_open() here.

What's the point in having both index_has_outdated_collation() and
index_has_outdated_collation()?

It seems to me that 0001 should be split into two patches:
- One for the backend OUTDATED option.
- One for pg_index_has_outdated_dependency(), which only makes sense
in-core once reindexdb is introduced.
--
Michael


signature.asc
Description: PGP signature


Re: REINDEX backend filtering

2021-03-14 Thread Julien Rouhaud
On Wed, Mar 03, 2021 at 01:56:59PM +0800, Julien Rouhaud wrote:
> 
> Please find attached v5 which address all previous comments:
> 
> - consistently use "outdated"
> - use REINDEX (OUTDATED) grammar (with a new unreserved OUTDATED keyword)
> - new --outdated option to reindexdb
> - expose a new "pg_index_has_outdated_dependency(regclass)" SQL function
> - use that function in reindexdb --outdated to sort tables by total
>   indexes-to-be-processed size

v6 attached, rebase only due to conflict with recent commit.
>From d2e05e6f64c88b0d5074b9963586bf6999276762 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v6 1/2] Add a new OUTDATED filtering facility for REINDEX
 command.

OUTDATED is added a new unreserved keyword.

When used, REINDEX will only process indexes that have an outdated dependency.
For now, only dependency on collations are supported but we'll likely support
other kind of dependency in the future.

Also add a new pg_index_has_outdated_dependency(regclass) SQL function, so
client code can filter such indexes if needed.  This function will also be used
in a following commit to teach reindexdb to use this new OUTDATED option and
order the tables by the amount of work that will actually be done.

Catversion (should be) bumped.

Author: Julien Rouhaud 
Reviewed-by:
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/func.sgml |  27 --
 doc/src/sgml/ref/reindex.sgml  |  12 +++
 src/backend/catalog/index.c| 107 -
 src/backend/commands/indexcmds.c   |  12 ++-
 src/backend/parser/gram.y  |   4 +-
 src/backend/utils/cache/relcache.c |  40 
 src/bin/psql/tab-complete.c|   2 +-
 src/include/catalog/index.h|   3 +
 src/include/catalog/pg_proc.dat|   4 +
 src/include/parser/kwlist.h|   1 +
 src/include/utils/relcache.h   |   1 +
 src/test/regress/expected/create_index.out |  10 ++
 src/test/regress/sql/create_index.sql  |  10 ++
 13 files changed, 221 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..0eda6678ac 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26448,12 +26448,13 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 

  shows the functions
-available for index maintenance tasks.  (Note that these maintenance
-tasks are normally done automatically by autovacuum; use of these
-functions is only required in special cases.)
-These functions cannot be executed during recovery.
-Use of these functions is restricted to superusers and the owner
-of the given index.
+available for index maintenance tasks.  (Note that the maintenance
+tasks performing actions on indexes are normally done automatically by
+autovacuum; use of these functions is only required in special cases.)
+The functions performing actions on indexes cannot be executed during
+recovery.
+Use of the functions performing actions on indexes is restricted to
+superusers and the owner of the given index.

 

@@ -26538,6 +26539,20 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 option.

   
+
+  
+   
+
+ pg_index_has_outdated_dependency
+
+pg_index_has_outdated_dependency ( index regclass )
+boolean
+   
+   
+Check if the specified index has any outdated dependency.  For now only
+dependency on collations are supported.
+   
+  
  
 

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index ff4dba8c36..aa66e6461f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -26,6 +26,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 where option can be one of:
 
 CONCURRENTLY [ boolean ]
+OUTDATED [ boolean ]
 TABLESPACE new_tablespace
 VERBOSE [ boolean ]
 
@@ -188,6 +189,17 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+OUTDATED
+
+ 
+  This option can be used to filter the list of indexes to rebuild and only
+  process indexes that have outdated dependencies.  Fow now, the only
+  handle dependency is for the collation provider version.
+ 
+
+   
+

 TABLESPACE
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..571feac5db 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool outdated;	/* depends on at least on deprected collation? */
+} IndexHasOutdatedColl;
+
 /* non-export function prototypes */
 static bool 

Re: REINDEX backend filtering

2021-03-02 Thread Julien Rouhaud
On Tue, Mar 02, 2021 at 12:01:55PM +0800, Julien Rouhaud wrote:
> 
> So, long running reindex due to some gigantic and/or numerous indexes on a
> single (or few) table is not something that we can solve, but inefficient
> reindex due to wrong table size / to-be-reindexed-indexes-size correlation can
> be addressed.
> 
> I would still prefer to go to backend implementation, so that all client tools
> can benefit from it by default.  We could simply export the current
> index_has_oudated_collation(oid) function in sql, and tweak pg_dump to order
> tables by the cumulated size of such indexes as you mentioned below, would
> that work for you?
> 
> Also, given Thomas proposal in a nearby email this function would be renamed 
> to
> index_has_oudated_dependencies(oid) or something like that.

Please find attached v5 which address all previous comments:

- consistently use "outdated"
- use REINDEX (OUTDATED) grammar (with a new unreserved OUTDATED keyword)
- new --outdated option to reindexdb
- expose a new "pg_index_has_outdated_dependency(regclass)" SQL function
- use that function in reindexdb --outdated to sort tables by total
  indexes-to-be-processed size

>From 5703ce209d414dd7a6fba18f581eca4671364834 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v5 1/2] Add a new OUTDATED filtering facility for REINDEX
 command.

OUTDATED is added a new unreserved keyword.

When used, REINDEX will only process indexes that have an outdated dependency.
For now, only dependency on collations are supported but we'll likely support
other kind of dependency in the future.

Also add a new pg_index_has_outdated_dependency(regclass) SQL function, so
client code can filter such indexes if needed.  This function will also be used
in a following commit to teach reindexdb to use this new OUTDATED option and
order the tables by the amount of work that will actually be done.

Catversion (should be) bumped.

Author: Julien Rouhaud 
Reviewed-by:
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/func.sgml |  27 --
 doc/src/sgml/ref/reindex.sgml  |  12 +++
 src/backend/catalog/index.c| 107 -
 src/backend/commands/indexcmds.c   |  12 ++-
 src/backend/parser/gram.y  |   4 +-
 src/backend/utils/cache/relcache.c |  40 
 src/bin/psql/tab-complete.c|   2 +-
 src/include/catalog/index.h|   3 +
 src/include/catalog/pg_proc.dat|   4 +
 src/include/parser/kwlist.h|   1 +
 src/include/utils/relcache.h   |   1 +
 src/test/regress/expected/create_index.out |  10 ++
 src/test/regress/sql/create_index.sql  |  10 ++
 13 files changed, 221 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bf99f82149..2cf6e66234 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26381,12 +26381,13 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 

  shows the functions
-available for index maintenance tasks.  (Note that these maintenance
-tasks are normally done automatically by autovacuum; use of these
-functions is only required in special cases.)
-These functions cannot be executed during recovery.
-Use of these functions is restricted to superusers and the owner
-of the given index.
+available for index maintenance tasks.  (Note that the maintenance
+tasks performing actions on indexes are normally done automatically by
+autovacuum; use of these functions is only required in special cases.)
+The functions performing actions on indexes cannot be executed during
+recovery.
+Use of the functions performing actions on indexes is restricted to
+superusers and the owner of the given index.

 

@@ -26471,6 +26472,20 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 option.

   
+
+  
+   
+
+ pg_index_has_outdated_dependency
+
+pg_index_has_outdated_dependency ( index regclass )
+boolean
+   
+   
+Check if the specified index has any outdated dependency.  For now only
+dependency on collations are supported.
+   
+  
  
 

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index b22d39eba9..2d94d49cde 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -26,6 +26,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 where option can be one of:
 
 CONCURRENTLY [ boolean ]
+OUTDATED [ boolean ]
 TABLESPACE new_tablespace
 VERBOSE [ boolean ]
 
@@ -188,6 +189,17 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+OUTDATED
+
+ 
+  This option can be used to filter the list of indexes to rebuild and only
+  process indexes that have outdated 

Re: REINDEX backend filtering

2021-03-01 Thread Julien Rouhaud
On Fri, Feb 26, 2021 at 11:17:26AM +0100, Magnus Hagander wrote:
> On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud  wrote:
> >
> > It means that you'll have to distribute the work on a per-table basis
> > rather than a per-index basis.  The time spent to find out that a
> > table doesn't have any impacted index should be negligible compared to
> > the cost of running a reindex.  This obviously won't help that much if
> > you have a lot of table but only one being gigantic.
> 
> Yeah -- or at least a couple of large and many small, which I find to
> be a very common scenario. Or the case of some tables having many
> affected indexes and some having few.
> 
> You'd basically want to order the operation by table on something like
> "total size of the affected indexes on table x" -- which may very well
> put a smaller table with many indexes earlier in the queue. But you
> can't do that without having access to the filter

So, long running reindex due to some gigantic and/or numerous indexes on a
single (or few) table is not something that we can solve, but inefficient
reindex due to wrong table size / to-be-reindexed-indexes-size correlation can
be addressed.

I would still prefer to go to backend implementation, so that all client tools
can benefit from it by default.  We could simply export the current
index_has_oudated_collation(oid) function in sql, and tweak pg_dump to order
tables by the cumulated size of such indexes as you mentioned below, would
that work for you?

Also, given Thomas proposal in a nearby email this function would be renamed to
index_has_oudated_dependencies(oid) or something like that.

> > But even if we put the logic in the client, this still won't help as
> > reindexdb doesn't support multiple job with an index list:
> >
> >  * Index-level REINDEX is not supported with multiple jobs as we
> >  * cannot control the concurrent processing of multiple indexes
> >  * depending on the same relation.
> >  */
> > if (concurrentCons > 1 && indexes.head != NULL)
> > {
> > pg_log_error("cannot use multiple jobs to reindex indexes");
> > exit(1);
> > }
> 
> That sounds like it would be a fixable problem though, in principle.
> It could/should probably still limit all indexes on the same table to
> be processed in the same connection for the locking reasons of course,
> but doing an order by the total size of the indexes like above, and
> ensuring that they are grouped that way, doesn't sound *that* hard. I
> doubt it's that important in the current usecase of manually listing
> the indexes, but it would be useful for something like this.

Yeah, I don't think that in case of oudated dependency the --index will be
useful, it's likely that there will be too many indexes to process.  We can
still try to improve reindexdb to be able to process index lists with parallel
connections, but I would rather keep that separated from this patch.




Re: REINDEX backend filtering

2021-02-26 Thread Magnus Hagander
On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud  wrote:
>
> On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander  wrote:
> >
> > I don't agree with the conclusion though.
> >
> > The most common case of that will be in the case of an upgrade. In
> > that case I want to reindex all of those indexes as quickly as
> > possible. So I'll want to parallelize it across multiple sessions
> > (like reindexdb -j 4 or whatever). But doesn't putting the filter in
> > the grammar prevent me from doing exactly that? Each of those 4 (or
> > whatever) sessions would have to guess which would go where and then
> > speculatively run the command on that, instead of being able to
> > directly distribute the worload?
>
> It means that you'll have to distribute the work on a per-table basis
> rather than a per-index basis.  The time spent to find out that a
> table doesn't have any impacted index should be negligible compared to
> the cost of running a reindex.  This obviously won't help that much if
> you have a lot of table but only one being gigantic.

Yeah -- or at least a couple of large and many small, which I find to
be a very common scenario. Or the case of some tables having many
affected indexes and some having few.

You'd basically want to order the operation by table on something like
"total size of the affected indexes on table x" -- which may very well
put a smaller table with many indexes earlier in the queue. But you
can't do that without having access to the filter


> But even if we put the logic in the client, this still won't help as
> reindexdb doesn't support multiple job with an index list:
>
>  * Index-level REINDEX is not supported with multiple jobs as we
>  * cannot control the concurrent processing of multiple indexes
>  * depending on the same relation.
>  */
> if (concurrentCons > 1 && indexes.head != NULL)
> {
> pg_log_error("cannot use multiple jobs to reindex indexes");
> exit(1);
> }

That sounds like it would be a fixable problem though, in principle.
It could/should probably still limit all indexes on the same table to
be processed in the same connection for the locking reasons of course,
but doing an order by the total size of the indexes like above, and
ensuring that they are grouped that way, doesn't sound *that* hard. I
doubt it's that important in the current usecase of manually listing
the indexes, but it would be useful for something like this.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: REINDEX backend filtering

2021-02-26 Thread Julien Rouhaud
On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander  wrote:
>
> I don't agree with the conclusion though.
>
> The most common case of that will be in the case of an upgrade. In
> that case I want to reindex all of those indexes as quickly as
> possible. So I'll want to parallelize it across multiple sessions
> (like reindexdb -j 4 or whatever). But doesn't putting the filter in
> the grammar prevent me from doing exactly that? Each of those 4 (or
> whatever) sessions would have to guess which would go where and then
> speculatively run the command on that, instead of being able to
> directly distribute the worload?

It means that you'll have to distribute the work on a per-table basis
rather than a per-index basis.  The time spent to find out that a
table doesn't have any impacted index should be negligible compared to
the cost of running a reindex.  This obviously won't help that much if
you have a lot of table but only one being gigantic.

But even if we put the logic in the client, this still won't help as
reindexdb doesn't support multiple job with an index list:

 * Index-level REINDEX is not supported with multiple jobs as we
 * cannot control the concurrent processing of multiple indexes
 * depending on the same relation.
 */
if (concurrentCons > 1 && indexes.head != NULL)
{
pg_log_error("cannot use multiple jobs to reindex indexes");
exit(1);
}




Re: REINDEX backend filtering

2021-02-26 Thread Magnus Hagander
On Thu, Jan 21, 2021 at 4:13 AM Julien Rouhaud  wrote:
>
> On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier  wrote:
> >
> > On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > > Is this really a common enough operation that we need it in the main 
> > > grammar?
> > >
> > > Having the functionality, definitely, but what if it was "just" a
> > > function instead? So you'd do something like:
> > > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > > \gexec
> > >
> > > Or even a function that returns the REINDEX commands directly (taking
> > > a parameter to turn on/off concurrency for example).
> > >
> > > That also seems like it would be easier to make flexible, and just as
> > > easy to plug into reindexdb?
> >
> > Having control in the grammar to choose which index to reindex for a
> > table is very useful when it comes to parallel reindexing, because
> > it is no-brainer in terms of knowing which index to distribute to one
> > job or another.  In short, with this grammar you can just issue a set
> > of REINDEX TABLE commands that we know will not conflict with each
> > other.  You cannot get that level of control with REINDEX INDEX as it
> > may be possible that more or more commands conflict if they work on an
> > index of the same relation because it is required to take lock also on
> > the parent table.  Of course, we could decide to implement a
> > redistribution logic in all frontend tools that need such things, like
> > reindexdb, but that's not something I think we should let the client
> > decide of.  A backend-side filtering is IMO much simpler, less code,
> > and more elegant.
>
> Maybe additional filtering capabilities is not something that will be
> required frequently, but I'm pretty sure that reindexing only indexes
> that might be corrupt is something that will be required often..  So I
> agree, having all that logic in the backend makes everything easier
> for users, having the choice of the tools they want to issue the query
> while still having all features available.

I agree with that scenario -- in that the most common case will be
exactly that of reindexing only indexes that might be corrupt.

I don't agree with the conclusion though.

The most common case of that will be in the case of an upgrade. In
that case I want to reindex all of those indexes as quickly as
possible. So I'll want to parallelize it across multiple sessions
(like reindexdb -j 4 or whatever). But doesn't putting the filter in
the grammar prevent me from doing exactly that? Each of those 4 (or
whatever) sessions would have to guess which would go where and then
speculatively run the command on that, instead of being able to
directly distribute the worload?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: REINDEX backend filtering

2021-02-26 Thread Magnus Hagander
On Wed, Dec 16, 2020 at 1:27 AM Michael Paquier  wrote:
>
> On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > Is this really a common enough operation that we need it in the main 
> > grammar?
> >
> > Having the functionality, definitely, but what if it was "just" a
> > function instead? So you'd do something like:
> > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > \gexec
> >
> > Or even a function that returns the REINDEX commands directly (taking
> > a parameter to turn on/off concurrency for example).
> >
> > That also seems like it would be easier to make flexible, and just as
> > easy to plug into reindexdb?
>
> Having control in the grammar to choose which index to reindex for a
> table is very useful when it comes to parallel reindexing, because
> it is no-brainer in terms of knowing which index to distribute to one
> job or another.  In short, with this grammar you can just issue a set
> of REINDEX TABLE commands that we know will not conflict with each
> other.  You cannot get that level of control with REINDEX INDEX as it

(oops, seems I forgot to reply to this one, sorry!)

Uh, isn't it almost exactly the opposite?

If you do it in the backend grammar you *cannot* parallelize it
between indexes, because you can only run one at a time.

Whereas if you do it in the frontend, you can. Either with something
like reindexdb that could do it automatically, or with psql as a
copy/paste job?


> may be possible that more or more commands conflict if they work on an
> index of the same relation because it is required to take lock also on
> the parent table.  Of course, we could decide to implement a
> redistribution logic in all frontend tools that need such things, like
> reindexdb, but that's not something I think we should let the client
> decide of.  A backend-side filtering is IMO much simpler, less code,
> and more elegant.

It's simpler in the simple case, i agree with that. But ISTM it's also
a lot less flexible for anything except the simplest case...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: REINDEX backend filtering

2021-02-25 Thread Julien Rouhaud
On Thu, Feb 25, 2021 at 07:36:02AM +1300, Thomas Munro wrote:
> On Thu, Feb 25, 2021 at 1:22 AM Julien Rouhaud  wrote:
> > #define reindexHasFilter(x)((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)
> 
> It's better to use "(x) & ..." in macros to avoid weird operator
> precedence problems in future code.

Ah indeed, thanks!  I usually always protect the arguments wth parenthesis but
I somehow missed this one.  I'll send a new version of the patch shortly with
the rest of the problems you mentioned.

> It seems like there are several different names for similar things in
> this patch: "outdated", "not current", "deprecated".  Can we settle on
> one, maybe "outdated"?

Oops, I apparently missed a lot of places during the various rewrite of the
patch.  +1 for oudated.

> 
> > The code as-is written to be extensible with possibly other filters
> > (e.g. specific library or specific version).  Feedback so far seems to
> > indicate that it may be overkill and only filtering indexes with
> > deprecated collation is enough.  I'll simplify this code in a future
> > version, getting rid of reindexHasFilter, unless someone thinks more
> > filter is a good idea.
> 
> Hmm, yeah, I think it should probably be very general. Suppose someone
> invents versioned dependencies for (say) functions or full text
> stemmers etc, then do we want to add more syntax here to rebuild
> indexes (assuming we don't use INVALID for such cases, IDK)?  I don't
> think we'd want to have more syntax just to be able to say "hey,
> please fix my collation problems but not my stemmer problems".  What
> about just REINDEX (OUTDATED)?  It's hard to find a single word that
> means "depends on an object whose version has changed".

That quite make sense.  I agree that it would make the solution simpler and
better.

Looking at the other use case for PostGIS mentioned by Darafei, it seems that
it would help to make concept of index dependency extensible for third party
code too (see
https://www.postgresql.org/message-id/20210226074531.dhkfneao2czzqk6n%40nol).
Would that make sense?




Re: REINDEX backend filtering

2021-02-25 Thread Julien Rouhaud
Hi,

On Wed, Feb 24, 2021 at 09:34:59PM +0300, Darafei "Komяpa" Praliaskouski wrote:
> Hello,
> 
> The PostGIS project needed this from time to time. Would be great if
> reindex by opclass can be made possible.
> 
> We changed the semantics of btree at least twice (in 2.4 and 3.0), fixed
> some ND mixed-dimension indexes semantics in 3.0, fixed hash index on 32
> bit arch in 3.0.

Oh, I wasn't aware of that.  Thanks for bringing this up!

Looking at the last occurence (c00f9525a3c6c) I see that the NEWS item does
mention the need to do a REINDEX.  As far as I understand there wouldn't be any
hard error if one doesn't do to a REINDEX, and you'd end up with some kind
of "logical" corruption as the new lib version won't have the same semantics
depending on the number of dimensions, so more or less the same kind of
problems that would happen in case of breaking update of a collation library.

It seems to me that it's a legitimate use case, especially since GiST doesn't
have a metapage to store an index version.  But I'm wondering if the right
answer is to allow REINDEX / reindexdb to look for specific opclass or to add
an API to let third party code register a custom dependency.  In this case
it would be some kind of gist ABI versioning.  We could then have a single
REINDEX option, like REINDEX (OUTDATED) as Thomas suggested in
https://www.postgresql.org/message-id/CA+hUKG+WWioP6xV5Xf1pPhiWNGD1B7hdBBCdQoKfp=zymja...@mail.gmail.com
for both cases.




Re: REINDEX backend filtering

2021-02-24 Thread Thomas Munro
On Thu, Feb 25, 2021 at 1:22 AM Julien Rouhaud  wrote:
> #define reindexHasFilter(x)((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)

It's better to use "(x) & ..." in macros to avoid weird operator
precedence problems in future code.

It seems like there are several different names for similar things in
this patch: "outdated", "not current", "deprecated".  Can we settle on
one, maybe "outdated"?

> The code as-is written to be extensible with possibly other filters
> (e.g. specific library or specific version).  Feedback so far seems to
> indicate that it may be overkill and only filtering indexes with
> deprecated collation is enough.  I'll simplify this code in a future
> version, getting rid of reindexHasFilter, unless someone thinks more
> filter is a good idea.

Hmm, yeah, I think it should probably be very general. Suppose someone
invents versioned dependencies for (say) functions or full text
stemmers etc, then do we want to add more syntax here to rebuild
indexes (assuming we don't use INVALID for such cases, IDK)?  I don't
think we'd want to have more syntax just to be able to say "hey,
please fix my collation problems but not my stemmer problems".  What
about just REINDEX (OUTDATED)?  It's hard to find a single word that
means "depends on an object whose version has changed".




Re: REINDEX backend filtering

2021-02-24 Thread Komяpa
Hello,

The PostGIS project needed this from time to time. Would be great if
reindex by opclass can be made possible.

We changed the semantics of btree at least twice (in 2.4 and 3.0), fixed
some ND mixed-dimension indexes semantics in 3.0, fixed hash index on 32
bit arch in 3.0.

On Thu, Dec 3, 2020 at 12:32 PM Julien Rouhaud  wrote:

> Hello,
>
> Now that we have the infrastructure to track indexes that might be
> corrupted
> due to changes in collation libraries, I think it would be a good idea to
> offer
> an easy way for users to reindex all indexes that might be corrupted.
>
> I'm attaching a POC patch as a discussion basis.  It implements a new
> "COLLATION" option to reindex, with "not_current" being the only accepted
> value.  Note that I didn't spent too much efforts on the grammar part yet.
>
> So for instance you can do:
>
> REINDEX (COLLATION 'not_current') DATABASE mydb;
>
> The filter is also implemented so that you could cumulate multiple
> filters, so
> it could be easy to add more filtering, for instance:
>
> REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
>
> to only rebuild indexes depending on outdated libc collations, or
>
> REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
>
> to only rebuild indexes depending on a specific version of libc.
>


-- 
Darafei "Komяpa" Praliaskouski
OSM BY Team - http://openstreetmap.by/


Re: REINDEX backend filtering

2021-02-24 Thread Julien Rouhaud
Hi,

On Thu, Feb 25, 2021 at 12:11 AM mariakatosvich
 wrote:
>
> From what I heard on this topic, the goal is to reduce
> the amount of time necessary to reindex a system so as REINDEX only
> works on indexes whose dependent collation versions are not known or
> works on indexes in need of a collation refresh (like a reindexdb
> --all --collation -j $jobs).

That's indeed the goal.  The current patch only adds infrastructure
for the REINDEX command, which will make easy to add the option for
reindexdb.  I'll add the reindexdb part too in the next version of the
patch.




Re: REINDEX backend filtering

2021-02-24 Thread mariakatosvich
>From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs). 



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: REINDEX backend filtering

2021-02-24 Thread Julien Rouhaud
Hi,

Thanks for the review!

On Mon, Feb 8, 2021 at 12:14 AM Zhihong Yu  wrote:
>
> Hi,
> For index_has_deprecated_collation(),
>
> +   object.objectSubId = 0;
>
> The objectSubId field is not accessed by 
> do_check_index_has_deprecated_collation(). Does it need to be assigned ?

Indeed it's not strictly necessary I think, but it makes things
cleaner and future proof, and that's how things are already done
nearby.  So I think it's better to keep it this way.

> For RelationGetIndexListFiltered(), it seems when (options & 
> REINDEXOPT_COLL_NOT_CURRENT) == 0, the full_list would be returned.
> This can be checked prior to entering the foreach loop.

That's already the case with this test:

/* Fast exit if no filtering was asked, or if the list if empty. */
if (!reindexHasFilter(options) || full_list == NIL)
return full_list;

knowing that

#define reindexHasFilter(x)((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)

The code as-is written to be extensible with possibly other filters
(e.g. specific library or specific version).  Feedback so far seems to
indicate that it may be overkill and only filtering indexes with
deprecated collation is enough.  I'll simplify this code in a future
version, getting rid of reindexHasFilter, unless someone thinks more
filter is a good idea.

For now I'm attaching a rebased version, there was a conflict with the
recent patch to add the missing_ok parameter to
get_collation_version_for_oid()


v4-0001-Add-a-new-COLLATION-option-to-REINDEX.patch
Description: Binary data


Re: REINDEX backend filtering

2021-02-07 Thread Zhihong Yu
Hi,
For index_has_deprecated_collation(),

+   object.objectSubId = 0;

The objectSubId field is not accessed
by do_check_index_has_deprecated_collation(). Does it need to be assigned ?

For RelationGetIndexListFiltered(), it seems when (options &
REINDEXOPT_COLL_NOT_CURRENT) == 0, the full_list would be returned.
This can be checked prior to entering the foreach loop.

Cheers

On Sat, Feb 6, 2021 at 11:20 PM Julien Rouhaud  wrote:

> On Thu, Jan 21, 2021 at 11:12:56AM +0800, Julien Rouhaud wrote:
> >
> > There was a conflict with a3dc926009be8 (Refactor option handling of
> > CLUSTER, REINDEX and VACUUM), so rebased version attached.  No other
> > changes included yet.
>
> New conflict, v3 attached.
>


Re: REINDEX backend filtering

2021-02-06 Thread Julien Rouhaud
On Thu, Jan 21, 2021 at 11:12:56AM +0800, Julien Rouhaud wrote:
> 
> There was a conflict with a3dc926009be8 (Refactor option handling of
> CLUSTER, REINDEX and VACUUM), so rebased version attached.  No other
> changes included yet.

New conflict, v3 attached.
>From 63afe51453d4413ad7e73c66268e6ff12bfe5436 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v3] Add a new COLLATION option to REINDEX.

---
 doc/src/sgml/ref/reindex.sgml  | 13 +
 src/backend/catalog/index.c| 59 +-
 src/backend/commands/indexcmds.c   | 13 +++--
 src/backend/utils/cache/relcache.c | 43 
 src/include/catalog/index.h|  4 ++
 src/include/utils/relcache.h   |  1 +
 src/test/regress/expected/create_index.out | 10 
 src/test/regress/sql/create_index.sql  | 10 
 8 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 07795b5737..ead2904b67 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,6 +25,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 where option can be one 
of:
 
+COLLATION [ text ]
 CONCURRENTLY [ boolean ]
 TABLESPACE new_tablespace
 VERBOSE [ boolean ]
@@ -169,6 +170,18 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+COLLATION
+
+ 
+  This option can be used to filter the list of indexes to rebuild.  The
+  only allowed value is 'not_current', which will only
+  process indexes that depend on a collation version different than the
+  current one.
+ 
+
+   
+

 CONCURRENTLY
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1cb9172a5f..b74ee79d38 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
Oid pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+   Oid relid;  /* targetr index oid */
+   bool deprecated;/* depends on at least on deprected collation? 
*/
+} IndexHasDeprecatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1350,6 +1356,57 @@ index_check_collation_versions(Oid relid)
list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one deprecated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_deprecated_collation(const ObjectAddress *otherObject,
+   
const char *version,
+   
char **new_version,
+   
void *data)
+{
+   IndexHasDeprecatedColl *context = data;
+   char *current_version;
+
+   /* We only care about dependencies on collations. */
+   if (otherObject->classId != CollationRelationId)
+   return false;
+
+   /* Fast exit if we already found a deprecated collation version. */
+   if (context->deprecated)
+   return false;
+
+   /* Ask the provider for the current version.  Give up if unsupported. */
+   current_version = get_collation_version_for_oid(otherObject->objectId);
+   if (!current_version)
+   return false;
+
+   if (!version || strcmp(version, current_version) != 0)
+   context->deprecated = true;
+
+   return false;
+}
+
+bool
+index_has_deprecated_collation(Oid relid)
+{
+   ObjectAddress object;
+   IndexHasDeprecatedColl context;
+
+   object.classId = RelationRelationId;
+   object.objectId = relid;
+   object.objectSubId = 0;
+
+   context.relid = relid;
+   context.deprecated = false;
+
+   visitDependenciesOf(, _check_index_has_deprecated_collation,
+   );
+
+   return context.deprecated;
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3930,7 +3987,7 @@ reindex_relation(Oid relid, int flags, ReindexParams 
*params)
 * relcache to get this with a sequential scan if ignoring system
 * indexes.)
 */
-   indexIds = RelationGetIndexList(rel);
+   indexIds = RelationGetIndexListFiltered(rel, params->options);
 
if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 127ba7835d..9bf69ff9d7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2475,6 +2475,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool 
isTopLevel)
boolconcurrently = 

Re: REINDEX backend filtering

2021-01-20 Thread Julien Rouhaud
On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier  wrote:
>
> On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > Is this really a common enough operation that we need it in the main 
> > grammar?
> >
> > Having the functionality, definitely, but what if it was "just" a
> > function instead? So you'd do something like:
> > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > \gexec
> >
> > Or even a function that returns the REINDEX commands directly (taking
> > a parameter to turn on/off concurrency for example).
> >
> > That also seems like it would be easier to make flexible, and just as
> > easy to plug into reindexdb?
>
> Having control in the grammar to choose which index to reindex for a
> table is very useful when it comes to parallel reindexing, because
> it is no-brainer in terms of knowing which index to distribute to one
> job or another.  In short, with this grammar you can just issue a set
> of REINDEX TABLE commands that we know will not conflict with each
> other.  You cannot get that level of control with REINDEX INDEX as it
> may be possible that more or more commands conflict if they work on an
> index of the same relation because it is required to take lock also on
> the parent table.  Of course, we could decide to implement a
> redistribution logic in all frontend tools that need such things, like
> reindexdb, but that's not something I think we should let the client
> decide of.  A backend-side filtering is IMO much simpler, less code,
> and more elegant.

Maybe additional filtering capabilities is not something that will be
required frequently, but I'm pretty sure that reindexing only indexes
that might be corrupt is something that will be required often..  So I
agree, having all that logic in the backend makes everything easier
for users, having the choice of the tools they want to issue the query
while still having all features available.

There was a conflict with a3dc926009be8 (Refactor option handling of
CLUSTER, REINDEX and VACUUM), so rebased version attached.  No other
changes included yet.


v2-0001-Add-a-new-COLLATION-option-to-REINDEX.patch
Description: Binary data


Re: REINDEX backend filtering

2020-12-15 Thread Michael Paquier
On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> Is this really a common enough operation that we need it in the main grammar?
> 
> Having the functionality, definitely, but what if it was "just" a
> function instead? So you'd do something like:
> SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> \gexec
> 
> Or even a function that returns the REINDEX commands directly (taking
> a parameter to turn on/off concurrency for example).
> 
> That also seems like it would be easier to make flexible, and just as
> easy to plug into reindexdb?

Having control in the grammar to choose which index to reindex for a
table is very useful when it comes to parallel reindexing, because
it is no-brainer in terms of knowing which index to distribute to one
job or another.  In short, with this grammar you can just issue a set
of REINDEX TABLE commands that we know will not conflict with each
other.  You cannot get that level of control with REINDEX INDEX as it
may be possible that more or more commands conflict if they work on an
index of the same relation because it is required to take lock also on
the parent table.  Of course, we could decide to implement a
redistribution logic in all frontend tools that need such things, like
reindexdb, but that's not something I think we should let the client 
decide of.  A backend-side filtering is IMO much simpler, less code,
and more elegant.
--
Michael


signature.asc
Description: PGP signature


Re: REINDEX backend filtering

2020-12-15 Thread Magnus Hagander
On Tue, Dec 15, 2020 at 12:22 PM Julien Rouhaud  wrote:
>
> On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier  wrote:
> >
> > On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:
> > > Now that we have the infrastructure to track indexes that might be 
> > > corrupted
> > > due to changes in collation libraries, I think it would be a good idea to 
> > > offer
> > > an easy way for users to reindex all indexes that might be corrupted.
> >
> > Yes.  It would be a good thing.
> >
> > > The filter is also implemented so that you could cumulate multiple 
> > > filters, so
> > > it could be easy to add more filtering, for instance:
> > >
> > > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
> > >
> > > to only rebuild indexes depending on outdated libc collations, or
> > >
> > > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
> > >
> > > to only rebuild indexes depending on a specific version of libc.
> >
> > Deciding on the grammar to use depends on the use cases we would like
> > to satisfy.  From what I heard on this topic, the goal is to reduce
> > the amount of time necessary to reindex a system so as REINDEX only
> > works on indexes whose dependent collation versions are not known or
> > works on indexes in need of a collation refresh (like a reindexdb
> > --all --collation -j $jobs).  What would be the benefit in having more
> > complexity with library-dependent settings while we could take care
> > of the use cases that matter the most with a simple grammar?  Perhaps
> > "not_current" is not the best match as a keyword, we could just use
> > "collation" and handle that as a boolean.  As long as we don't need
> > new operators in the grammar rules..
>
> I'm not sure what the DBA usual pattern here.  If the reindexing
> runtime is really critical, I'm assuming that at least some people
> will dig into library details to see what are the collations that
> actually broke in the last upgrade and will want to reindex only
> those, and force the version for the rest of the indexes.  And
> obviously, they probably won't wait to have multiple collation
> versions dependencies before taking care of that.  In that case the
> filters that would matters would be one to only keep indexes with an
> outdated collation version, and an additional one for a specific
> collation name.  Or we could have the COLLATION keyword without
> additional argument mean all outdated collations, and COLLATION
> 'collation_name' to specify a specific one.  This is maybe a bit ugly,
> and would probably require a different approach for reindexdb.

Is this really a common enough operation that we need it i the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: REINDEX backend filtering

2020-12-15 Thread Julien Rouhaud
On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier  wrote:
>
> On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:
> > Now that we have the infrastructure to track indexes that might be corrupted
> > due to changes in collation libraries, I think it would be a good idea to 
> > offer
> > an easy way for users to reindex all indexes that might be corrupted.
>
> Yes.  It would be a good thing.
>
> > The filter is also implemented so that you could cumulate multiple filters, 
> > so
> > it could be easy to add more filtering, for instance:
> >
> > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
> >
> > to only rebuild indexes depending on outdated libc collations, or
> >
> > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
> >
> > to only rebuild indexes depending on a specific version of libc.
>
> Deciding on the grammar to use depends on the use cases we would like
> to satisfy.  From what I heard on this topic, the goal is to reduce
> the amount of time necessary to reindex a system so as REINDEX only
> works on indexes whose dependent collation versions are not known or
> works on indexes in need of a collation refresh (like a reindexdb
> --all --collation -j $jobs).  What would be the benefit in having more
> complexity with library-dependent settings while we could take care
> of the use cases that matter the most with a simple grammar?  Perhaps
> "not_current" is not the best match as a keyword, we could just use
> "collation" and handle that as a boolean.  As long as we don't need
> new operators in the grammar rules..

I'm not sure what the DBA usual pattern here.  If the reindexing
runtime is really critical, I'm assuming that at least some people
will dig into library details to see what are the collations that
actually broke in the last upgrade and will want to reindex only
those, and force the version for the rest of the indexes.  And
obviously, they probably won't wait to have multiple collation
versions dependencies before taking care of that.  In that case the
filters that would matters would be one to only keep indexes with an
outdated collation version, and an additional one for a specific
collation name.  Or we could have the COLLATION keyword without
additional argument mean all outdated collations, and COLLATION
'collation_name' to specify a specific one.  This is maybe a bit ugly,
and would probably require a different approach for reindexdb.




Re: REINDEX backend filtering

2020-12-13 Thread Michael Paquier
On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:
> Now that we have the infrastructure to track indexes that might be corrupted
> due to changes in collation libraries, I think it would be a good idea to 
> offer
> an easy way for users to reindex all indexes that might be corrupted.

Yes.  It would be a good thing.

> The filter is also implemented so that you could cumulate multiple filters, so
> it could be easy to add more filtering, for instance:
> 
> REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
> 
> to only rebuild indexes depending on outdated libc collations, or
> 
> REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
> 
> to only rebuild indexes depending on a specific version of libc.

Deciding on the grammar to use depends on the use cases we would like
to satisfy.  From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs).  What would be the benefit in having more
complexity with library-dependent settings while we could take care
of the use cases that matter the most with a simple grammar?  Perhaps
"not_current" is not the best match as a keyword, we could just use
"collation" and handle that as a boolean.  As long as we don't need
new operators in the grammar rules..
--
Michael


signature.asc
Description: PGP signature


REINDEX backend filtering

2020-12-03 Thread Julien Rouhaud
Hello,

Now that we have the infrastructure to track indexes that might be corrupted
due to changes in collation libraries, I think it would be a good idea to offer
an easy way for users to reindex all indexes that might be corrupted.

I'm attaching a POC patch as a discussion basis.  It implements a new
"COLLATION" option to reindex, with "not_current" being the only accepted
value.  Note that I didn't spent too much efforts on the grammar part yet.

So for instance you can do:

REINDEX (COLLATION 'not_current') DATABASE mydb;

The filter is also implemented so that you could cumulate multiple filters, so
it could be easy to add more filtering, for instance:

REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;

to only rebuild indexes depending on outdated libc collations, or

REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;

to only rebuild indexes depending on a specific version of libc.
>From 5acf42e15c0dc8b185547ff9cb9371a86a057ec9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v1] Add a new COLLATION option to REINDEX.

---
 doc/src/sgml/ref/reindex.sgml  | 13 +
 src/backend/catalog/index.c| 59 +-
 src/backend/commands/indexcmds.c   | 12 +++--
 src/backend/utils/cache/relcache.c | 43 
 src/include/catalog/index.h|  6 ++-
 src/include/utils/relcache.h   |  1 +
 src/test/regress/expected/create_index.out | 10 
 src/test/regress/sql/create_index.sql  | 10 
 8 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 6e1cf06713..eb8da9c070 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,6 +25,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 where option can be one 
of:
 
+COLLATION [ text ]
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
 
@@ -168,6 +169,18 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+COLLATION
+
+ 
+  This option can be used to filter the list of indexes to rebuild.  The
+  only allowed value is 'not_current', which will only
+  process indexes that depend on a collation version different than the
+  current one.
+ 
+
+   
+

 CONCURRENTLY
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 731610c701..7d941f40af 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -99,6 +99,12 @@ typedef struct
Oid pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+   Oid relid;  /* targetr index oid */
+   bool deprecated;/* depends on at least on deprected collation? 
*/
+} IndexHasDeprecatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1349,6 +1355,57 @@ index_check_collation_versions(Oid relid)
list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one deprecated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_deprecated_collation(const ObjectAddress *otherObject,
+   
const char *version,
+   
char **new_version,
+   
void *data)
+{
+   IndexHasDeprecatedColl *context = data;
+   char *current_version;
+
+   /* We only care about dependencies on collations. */
+   if (otherObject->classId != CollationRelationId)
+   return false;
+
+   /* Fast exit if we already found a deprecated collation version. */
+   if (context->deprecated)
+   return false;
+
+   /* Ask the provider for the current version.  Give up if unsupported. */
+   current_version = get_collation_version_for_oid(otherObject->objectId);
+   if (!current_version)
+   return false;
+
+   if (!version || strcmp(version, current_version) != 0)
+   context->deprecated = true;
+
+   return false;
+}
+
+bool
+index_has_deprecated_collation(Oid relid)
+{
+   ObjectAddress object;
+   IndexHasDeprecatedColl context;
+
+   object.classId = RelationRelationId;
+   object.objectId = relid;
+   object.objectSubId = 0;
+
+   context.relid = relid;
+   context.deprecated = false;
+
+   visitDependenciesOf(, _check_index_has_deprecated_collation,
+   );
+
+   return context.deprecated;
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3886,7 +3943,7 @@ reindex_relation(Oid relid, int flags,