Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-02-23 Thread Álvaro Herrera
On 2021-Feb-23, Masahiko Sawada wrote:

> I've looked at the v3 patch and it looks good to me. A minor comment is:
> 
> +   /* Catalog tables need to consider all backends. */
> +   h->catalog_oldest_nonremovable =
> +   TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +
> 
> I think that the above comment is not accurate since we consider only
> backends on the same database in some cases.

You're right.  Adjusted.

> the patch makes the following change:
> 
> @@ -1847,7 +1871,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
> h->shared_oldest_nonremovable =
> TransactionIdOlder(h->shared_oldest_nonremovable,
>h->slot_catalog_xmin);
> -   h->catalog_oldest_nonremovable = h->data_oldest_nonremovable;
> h->catalog_oldest_nonremovable =
> TransactionIdOlder(h->catalog_oldest_nonremovable,
>h->slot_catalog_xmin);
> 
> In the original code, catalog_oldest_nonremovable is initialized with
> data_oldest_nonremovable that considered slot_xmin. Therefore, IIUC
> catalog_oldest_nonremovable is the oldest transaction id among
> data_oldest_nonremovable, slot_xmin, and slot_catalog_xmin. But with
> the patch, catalog_oldest_nonremovable doesn't consider slot_xmin. It
> seems okay to me as catalog_oldest_nonremovable is interested in only
> slot_catalog_xmin.

I tend to agree -- it seems safe to ignore slot_xmin in that case.
However, tracing through the maze of xmin vs. catalog_xmin transmission
across the whole system is not trivial.  I changed the patch so that we
adjust to slot_xmin first, then to slot_catalog_xmin.  I *think* the end
effect should be the same.  But if we want to make that claim to save
that one assignment, we can make it in a separate commit.  However,
given that this comparison and assignment is outside the locked area, I
don't think it's worth bothering with.

Thanks for the input!  I have pushed this patch.

-- 
Álvaro Herrera   Valdivia, Chile
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-02-23 Thread Masahiko Sawada
On Tue, Feb 23, 2021 at 9:15 AM Álvaro Herrera  wrote:
>
> On 2021-Feb-22, Álvaro Herrera wrote:
>
> > Here's an updated patch.
> >
> > I haven't added commentary or documentation to account for the new
> > assumption, per Matthias' comment and Robert's discussion thereof.
>
> Done that.  I also restructured the code a bit, since one line in
> ComputedXidHorizons was duplicative.

I've looked at the v3 patch and it looks good to me. A minor comment is:

+   /* Catalog tables need to consider all backends. */
+   h->catalog_oldest_nonremovable =
+   TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
+

I think that the above comment is not accurate since we consider only
backends on the same database in some cases.

FWIW, just for the record, around the following original code,

/*
 * Check whether there are replication slots requiring an older xmin.
 */
h->shared_oldest_nonremovable =
TransactionIdOlder(h->shared_oldest_nonremovable, h->slot_xmin);
h->data_oldest_nonremovable =
TransactionIdOlder(h->data_oldest_nonremovable, h->slot_xmin);

/*
 * The only difference between catalog / data horizons is that the slot's
 * catalog xmin is applied to the catalog one (so catalogs can be accessed
 * for logical decoding). Initialize with data horizon, and then back up
 * further if necessary. Have to back up the shared horizon as well, since
 * that also can contain catalogs.
 */
h->shared_oldest_nonremovable_raw = h->shared_oldest_nonremovable;
h->shared_oldest_nonremovable =
TransactionIdOlder(h->shared_oldest_nonremovable,
   h->slot_catalog_xmin);
h->catalog_oldest_nonremovable = h->data_oldest_nonremovable;
h->catalog_oldest_nonremovable =
TransactionIdOlder(h->catalog_oldest_nonremovable,
   h->slot_catalog_xmin);

the patch makes the following change:

@@ -1847,7 +1871,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
h->shared_oldest_nonremovable =
TransactionIdOlder(h->shared_oldest_nonremovable,
   h->slot_catalog_xmin);
-   h->catalog_oldest_nonremovable = h->data_oldest_nonremovable;
h->catalog_oldest_nonremovable =
TransactionIdOlder(h->catalog_oldest_nonremovable,
   h->slot_catalog_xmin);

In the original code, catalog_oldest_nonremovable is initialized with
data_oldest_nonremovable that considered slot_xmin. Therefore, IIUC
catalog_oldest_nonremovable is the oldest transaction id among
data_oldest_nonremovable, slot_xmin, and slot_catalog_xmin. But with
the patch, catalog_oldest_nonremovable doesn't consider slot_xmin. It
seems okay to me as catalog_oldest_nonremovable is interested in only
slot_catalog_xmin.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-02-22 Thread Álvaro Herrera
On 2021-Feb-22, Álvaro Herrera wrote:

> Here's an updated patch.
> 
> I haven't added commentary or documentation to account for the new
> assumption, per Matthias' comment and Robert's discussion thereof.

Done that.  I also restructured the code a bit, since one line in
ComputedXidHorizons was duplicative.


-- 
Álvaro Herrera39°49'30"S 73°17'W
Tom: There seems to be something broken here.
Teodor: I'm in sackcloth and ashes...  Fixed.
http://archives.postgresql.org/message-id/482d1632.8010...@sigaev.ru
>From 98fa9ae33cb890e1d24a0c9be3a14139f15fdafe Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 15 Jan 2021 14:03:14 -0300
Subject: [PATCH v3] Teach VACUUM to ignore CIC

Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
---
 src/backend/storage/ipc/procarray.c | 39 +++--
 src/backend/utils/misc/guc.c|  2 +-
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a5daea8957..9c88f58c7f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1610,7 +1610,13 @@ TransactionIdIsActive(TransactionId xid)
  * relations that's not required, since only backends in my own database could
  * ever see the tuples in them. Also, we can ignore concurrently running lazy
  * VACUUMs because (a) they must be working on other tables, and (b) they
- * don't need to do snapshot-based lookups.
+ * don't need to do snapshot-based lookups.  Similarly, for the non-catalog
+ * horizon, we can ignore CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+ * when they are working on non-partial, non-expressional indexes, for the
+ * same reasons and because they can't run in transaction blocks.  (They are
+ * not possible to ignore for catalogs, because CIC and RC do some catalog
+ * operations.)  Do note that this means that CIC and RC must use a lock level
+ * that conflicts with VACUUM.
  *
  * This also computes a horizon used to truncate pg_subtrans. For that
  * backends in all databases have to be considered, and concurrently running
@@ -1660,9 +1666,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	bool		in_recovery = RecoveryInProgress();
 	TransactionId *other_xids = ProcGlobal->xids;
 
-	/* inferred after ProcArrayLock is released */
-	h->catalog_oldest_nonremovable = InvalidTransactionId;
-
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	h->latest_completed = ShmemVariableCache->latestCompletedXid;
@@ -1682,6 +1685,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 
 		h->oldest_considered_running = initial;
 		h->shared_oldest_nonremovable = initial;
+		h->catalog_oldest_nonremovable = initial;
 		h->data_oldest_nonremovable = initial;
 
 		/*
@@ -1752,7 +1756,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING))
 			continue;
 
-		/* shared tables need to take backends in all database into account */
+		/* shared tables need to take backends in all databases into account */
 		h->shared_oldest_nonremovable =
 			TransactionIdOlder(h->shared_oldest_nonremovable, xmin);
 
@@ -1773,11 +1777,26 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 			MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId ||
 			proc->databaseId == 0)	/* always include WalSender */
 		{
-			h->data_oldest_nonremovable =
-TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+			/*
+			 * We can ignore this backend if it's running CREATE INDEX
+			 * CONCURRENTLY or REINDEX CONCURRENTLY on a "safe" index -- but
+			 * only on vacuums of user-defined tables.
+			 */
+			if (!(statusFlags & PROC_IN_SAFE_IC))
+h->data_oldest_nonremovable =
+	TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+
+			/* Catalog tables need to consider all backends. */
+			h->catalog_oldest_nonremovable =
+TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
+
 		}
 	}
 
+	/* catalog horizon should never be later than data */
+	Assert(TransactionIdPrecedesOrEquals(h->catalog_oldest_nonremovable,
+		 h->data_oldest_nonremovable));
+
 	/*
 	 * If in recovery fetch oldest xid in KnownAssignedXids, will be applied
 	 * after lock is released.
@@ -1799,6 +1818,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 			TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin);
 		h->data_oldest_nonremovable =
 			TransactionIdOlder(h->data_oldest_nonremovable, kaxmin);
+		h->catalog_oldest_nonremovable =
+			TransactionIdOlder(h->catalog_oldest_nonremovable, kaxmin);
 		/* temp relations cannot be accessed in recovery */
 	}
 	else
@@ -1825,6 +1846,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		h->data_oldest_nonremovable =
 			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
 	 vacuum_defer_cleanup_age);
+		h->catalog_oldest_nonremovable =
+			TransactionIdRetreatedBy(h->catalog_oldest_nonremovable,
+	 

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-02-22 Thread Álvaro Herrera
Here's an updated patch.

I haven't added commentary or documentation to account for the new
assumption, per Matthias' comment and Robert's discussion thereof.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)
>From 4e1cf2ee0e6c9fb49fc2bc5932d3a59d65ee9ea7 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 15 Jan 2021 14:03:14 -0300
Subject: [PATCH v2] Teach VACUUM to ignore CIC

Discussion: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
---
 src/backend/storage/ipc/procarray.c | 40 +++--
 src/backend/utils/misc/guc.c|  2 +-
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a5daea8957..88b8271c22 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1610,7 +1610,12 @@ TransactionIdIsActive(TransactionId xid)
  * relations that's not required, since only backends in my own database could
  * ever see the tuples in them. Also, we can ignore concurrently running lazy
  * VACUUMs because (a) they must be working on other tables, and (b) they
- * don't need to do snapshot-based lookups.
+ * don't need to do snapshot-based lookups.  Also, for the non-catalog
+ * horizon, we can ignore CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+ * when they are working on non-partial, non-expressional indexes, for the
+ * same reasons and because they can't run in transaction blocks.  (They are
+ * not possible to ignore for catalogs, because CIC and RC do some catalog
+ * operations.)
  *
  * This also computes a horizon used to truncate pg_subtrans. For that
  * backends in all databases have to be considered, and concurrently running
@@ -1660,9 +1665,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	bool		in_recovery = RecoveryInProgress();
 	TransactionId *other_xids = ProcGlobal->xids;
 
-	/* inferred after ProcArrayLock is released */
-	h->catalog_oldest_nonremovable = InvalidTransactionId;
-
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	h->latest_completed = ShmemVariableCache->latestCompletedXid;
@@ -1682,6 +1684,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 
 		h->oldest_considered_running = initial;
 		h->shared_oldest_nonremovable = initial;
+		h->catalog_oldest_nonremovable = initial;
 		h->data_oldest_nonremovable = initial;
 
 		/*
@@ -1752,7 +1755,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING))
 			continue;
 
-		/* shared tables need to take backends in all database into account */
+		/* shared tables need to take backends in all databases into account */
 		h->shared_oldest_nonremovable =
 			TransactionIdOlder(h->shared_oldest_nonremovable, xmin);
 
@@ -1768,16 +1771,33 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * to prune still needed data away). If the current backend never
 		 * connects to a database that is harmless, because
 		 * data_oldest_nonremovable will never be utilized.
+		 *
+		 * Additionally, processes doing CREATE INDEX CONCURRENTLY and REINDEX
+		 * CONCURRENTLY on "safe" indexes can be ignored for non-catalog
+		 * horizon. (But not for catalogs: some transactions in CIC/RC do
+		 * catalog updates.)
 		 */
 		if (in_recovery ||
 			MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId ||
 			proc->databaseId == 0)	/* always include WalSender */
 		{
-			h->data_oldest_nonremovable =
-TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+			if (statusFlags & PROC_IN_SAFE_IC)
+h->catalog_oldest_nonremovable =
+	TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
+			else
+			{
+h->data_oldest_nonremovable =
+	TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+h->catalog_oldest_nonremovable =
+	TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
+			}
 		}
 	}
 
+	/* catalog horizon should never be later than data */
+	Assert(TransactionIdPrecedesOrEquals(h->catalog_oldest_nonremovable,
+		 h->data_oldest_nonremovable));
+
 	/*
 	 * If in recovery fetch oldest xid in KnownAssignedXids, will be applied
 	 * after lock is released.
@@ -1799,6 +1819,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 			TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin);
 		h->data_oldest_nonremovable =
 			TransactionIdOlder(h->data_oldest_nonremovable, kaxmin);
+		h->catalog_oldest_nonremovable =
+			TransactionIdOlder(h->catalog_oldest_nonremovable, kaxmin);
 		/* temp relations cannot be accessed in recovery */
 	}
 	else
@@ -1825,6 +1847,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		h->data_oldest_nonremovable =
 			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
 	 vacuum_defer_cleanup_age);
+		h->catalog_oldest_nonremovable =
+			TransactionIdRetreatedBy(h->catalog_oldest_nonremovable,
+	 vacuum_defer_cleanup_age);
 		/* defer 

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-26 Thread Robert Haas
On Thu, Jan 21, 2021 at 3:08 PM Matthias van de Meent
 wrote:
> Re-thinking this, and after some research:
>
> Is the behaviour of "any process that invalidates TIDs in this table
> (that could be in an index on this table) always holds a lock that
> conflicts with CIC/RiC on that table" a requirement of tableams, or is
> it an implementation-detail?
>
> If it is a requirement, then this patch is a +1 for me (and that
> requirement should be documented in such case), otherwise a -1 while
> there is no mechanism in place to remove concurrently-invalidated TIDs
> from CIC-ed/RiC-ed indexes.

I don't believe that the table AM machinery presumes that every kind
of table AM necessarily has to support VACUUM; or at least, I don't
think the original version presumed that. However, if you want it to
work some other way, there's not really any infrastructure for that,
either. Like, if you want to run your own background workers to clean
up your table, you could, but you'd have to arrange that yourself. If
you want a SWEEP command or an OPTIMIZE TABLE command instead of
VACUUM, you'd have to hack the grammar. Likewise, I don't know that
there's any guarantee that CLUSTER would work on a table of any old AM
either, or that it would do anything useful.

But having said that, if you have a table AM that *does* support
VACUUM and CLUSTER, I think it would have to treat TIDs pretty much
just as we do today. Almost by definition, they have to live with the
way the rest of the system works. TIDs have to be 6 bytes, and lock
levels can't be changed, because there's nothing in table AM that
would let you tinker with that stuff. The table AM can opt out of that
machinery altogether in favor of just throwing an error, but it can't
make it work differently unless somebody extends the API. It's
possible that this might suck a lot for some AMs. For instance, if
your AM is an in-memory btree, you might want CLUSTER to work in
place, rather than by copying the whole relation file, but I doubt
it's possible to make that work using the APIs as they exist. Maybe
someday we'll have better ones, but right now we don't.

As far as the specific question asked here, I don't really understand
how it could ever work any differently. If you want to invalidate some
TIDs in such a way that they can be reused by unrelated tuples - in
the case of the heap this would be by making the line pointers
LP_UNUSED - that has to be coordinated with the indexing machinery. I
can imagine a table AM that never reuses TIDs - especially if we
eventually have TIDs > 6 bytes - but I can't imagine a table AM that
reuses TIDs that might still be present in the index without telling
the index. And that seems to be what is being proposed here: if CIC or
RiC could be putting TIDs into indexes while at the same time some of
those very same TIDs were being made available for reuse, that would
be chaos, and right now we have no mechanism other than the lock level
to prevent that. We could invent one, maybe, but it doesn't exist
today, and zheap never tried to invent anything like that.

I agree that, if we do this, the comment should make clear that the
CIC/RiC lock has to conflict with VACUUM to avoid indexing tuples that
VACUUM is busy marking LP_UNUSED, and that this is precisely because
other backends will ignore the CIC/RiC snapshot while determining
whether a tuple is live. I don't think this is the case.
Hypothetically speaking, suppose we rejected this patch. Then, suppose
someone proposed to replace ShareUpdateExclusiveLock with two new lock
levels VacuumTuplesLock and IndexTheRelationLock each of which
conflicts with itself and all other lock levels with which
ShareUpdateExclusiveLock conflicts, but the two don't conflict with
each other. AFAIK, that patch would be acceptable today and
unacceptable after this change. While I'm not arguing that as a reason
to reject this patch, it's not impossible that somebody could: they'd
just need to argue that the separated lock levels would have greater
value than this patch. I don't think I believe that, but it's not a
totally crazy suggestion. My main point though is that this
meaningfully changing some assumptions about how the system works, and
we should be sure to be clear about that.

I looked at the patch but don't really understand it; the code in this
area seems to have changed a lot since I last looked at it. So, I'm
just commenting mostly on the specific question that was asked, and a
little bit on the theory of the patch, but without expressing an
opinion on the implementation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-21 Thread Matthias van de Meent
On Tue, 19 Jan 2021 at 21:59, Matthias van de Meent
 wrote:
>
> On Mon, 18 Jan 2021, 21:25 Álvaro Herrera,  wrote:
> >
> > On 2021-Jan-18, Matthias van de Meent wrote:
> >
> > > Example:
> > >
> > > 1.) RI starts
> > > 2.) PHASE 2: filling the index:
> > > 2.1.) scanning the heap (live tuple is cached)
> > > < tuple is deleted
> > > < last transaction other than RI commits, only snapshot of RI exists
> > > < vacuum drops the tuple, and cannot remove it from the new index
> > > because this new index is not yet populated.
> > > 2.2.) sorting tuples
> > > 2.3.) index filled with tuples, incl. deleted tuple
> > > 3.) PHASE 3: wait for transactions
> > > 4.) PHASE 4: validate does not remove the tuple from the index,
> > > because it is not built to do so: it will only insert new tuples.
> > > Tuples that are marked for deletion are removed from the index only
> > > through VACUUM (and optimistic ALL_DEAD detection).
> > >
> > > According to my limited knowledge of RI, it requires VACUUM to not run
> > > on the table during the initial index build process (which is
> > > currently guaranteed through the use of a snapshot).
> >
> > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
> > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
> > occur.
>
> Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock.
> Are there no other ways that pages are optimistically pruned?
>
> But the base case still stands, ignoring CIC snapshots in  would give
> the semantic of all_dead to tuples that are actually still considered
> alive in some context, and should not yet be deleted (you're deleting
> data from an in-use snapshot). Any local pruning optimizations using
> all_dead mechanics now cannot be run on the table unless they hold an
> ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms
> (other than below).

Re-thinking this, and after some research:

Is the behaviour of "any process that invalidates TIDs in this table
(that could be in an index on this table) always holds a lock that
conflicts with CIC/RiC on that table" a requirement of tableams, or is
it an implementation-detail?

If it is a requirement, then this patch is a +1 for me (and that
requirement should be documented in such case), otherwise a -1 while
there is no mechanism in place to remove concurrently-invalidated TIDs
from CIC-ed/RiC-ed indexes.

This concurrently-invalidated check could be done through e.g.
updating validate_index to have one more phase that removes unknown /
incorrect TIDs from the index. As a note: index insertion logic would
then also have to be able to handle duplicate TIDs in the index.

Regards,

Matthias van de Meent


Regards,

Matthias van de Meent




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-19 Thread Matthias van de Meent
On Mon, 18 Jan 2021, 21:25 Álvaro Herrera,  wrote:
>
> On 2021-Jan-18, Matthias van de Meent wrote:
>
> > Example:
> >
> > 1.) RI starts
> > 2.) PHASE 2: filling the index:
> > 2.1.) scanning the heap (live tuple is cached)
> > < tuple is deleted
> > < last transaction other than RI commits, only snapshot of RI exists
> > < vacuum drops the tuple, and cannot remove it from the new index
> > because this new index is not yet populated.
> > 2.2.) sorting tuples
> > 2.3.) index filled with tuples, incl. deleted tuple
> > 3.) PHASE 3: wait for transactions
> > 4.) PHASE 4: validate does not remove the tuple from the index,
> > because it is not built to do so: it will only insert new tuples.
> > Tuples that are marked for deletion are removed from the index only
> > through VACUUM (and optimistic ALL_DEAD detection).
> >
> > According to my limited knowledge of RI, it requires VACUUM to not run
> > on the table during the initial index build process (which is
> > currently guaranteed through the use of a snapshot).
>
> VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
> ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
> occur.

Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock.
Are there no other ways that pages are optimistically pruned?

But the base case still stands, ignoring CIC snapshots in  would give
the semantic of all_dead to tuples that are actually still considered
alive in some context, and should not yet be deleted (you're deleting
data from an in-use snapshot). Any local pruning optimizations using
all_dead mechanics now cannot be run on the table unless they hold an
ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms
(other than below).

> I do wonder if the problem you suggest (or something similar) can occur
> via HOT pruning, though.

It could not, at least not at the current HEAD, as only one tuple in a
HOT-chain can be alive at one point, and all indexes point to the root
of the HOT-chain, which is never HOT-pruned. See also the
src/backend/access/heap/README.HOT.

Regards,

Matthias van de Meent




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-18 Thread Álvaro Herrera
On 2021-Jan-18, Matthias van de Meent wrote:

> On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera  wrote:

> Would this not need to be the following? Right now, it resets
> potentially older h->catalog_oldest_nonremovable (which is set in the
> PROC_IN_SAFE_IC branch).
> 
> > +if (statusFlags & PROC_IN_SAFE_IC)
> > +h->catalog_oldest_nonremovable =
> > +TransactionIdOlder(h->catalog_oldest_nonremovable, 
> > xmin);
> > +else
> > +{
> > +h->data_oldest_nonremovable =
> > +TransactionIdOlder(h->data_oldest_nonremovable, xmin);
> > +h->catalog_oldest_nonremovable =
> > +TransactionIdOlder(h->catalog_oldest_nonremovable, 
> > xmin);
> > +}

Oops, you're right.

-- 
Álvaro Herrera   Valdivia, Chile




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-18 Thread Álvaro Herrera
On 2021-Jan-18, Matthias van de Meent wrote:

> Example:
> 
> 1.) RI starts
> 2.) PHASE 2: filling the index:
> 2.1.) scanning the heap (live tuple is cached)
> < tuple is deleted
> < last transaction other than RI commits, only snapshot of RI exists
> < vacuum drops the tuple, and cannot remove it from the new index
> because this new index is not yet populated.
> 2.2.) sorting tuples
> 2.3.) index filled with tuples, incl. deleted tuple
> 3.) PHASE 3: wait for transactions
> 4.) PHASE 4: validate does not remove the tuple from the index,
> because it is not built to do so: it will only insert new tuples.
> Tuples that are marked for deletion are removed from the index only
> through VACUUM (and optimistic ALL_DEAD detection).
> 
> According to my limited knowledge of RI, it requires VACUUM to not run
> on the table during the initial index build process (which is
> currently guaranteed through the use of a snapshot).

VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
occur.

I do wonder if the problem you suggest (or something similar) can occur
via HOT pruning, though.

-- 
Álvaro Herrera   Valdivia, Chile




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-18 Thread Matthias van de Meent
On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera  wrote:
>
> So one last remaining improvement was to have VACUUM ignore processes
> doing CIC and RC to compute the Xid horizon of tuples to remove.  I
> think we can do something simple like the attached patch.

Regarding the patch:

> +if (statusFlags & PROC_IN_SAFE_IC)
> +h->catalog_oldest_nonremovable =
> +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +else
> +h->data_oldest_nonremovable = h->catalog_oldest_nonremovable 
> =
> +TransactionIdOlder(h->data_oldest_nonremovable, xmin);

Would this not need to be the following? Right now, it resets
potentially older h->catalog_oldest_nonremovable (which is set in the
PROC_IN_SAFE_IC branch).

> +if (statusFlags & PROC_IN_SAFE_IC)
> +h->catalog_oldest_nonremovable =
> +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +else
> +{
> +h->data_oldest_nonremovable =
> +TransactionIdOlder(h->data_oldest_nonremovable, xmin);
> +h->catalog_oldest_nonremovable =
> +TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +}


Prior to reading the rest of my response: I do not understand the
intricacies of the VACUUM process, nor the systems of db snapshots, so
it's reasonably possible I misunderstand this.
But would this patch not allow for tuples to be created, deleted and
vacuumed away from a table whilst rebuilding an index on that table,
resulting in invalid pointers in the index?

Example:

1.) RI starts
2.) PHASE 2: filling the index:
2.1.) scanning the heap (live tuple is cached)
< tuple is deleted
< last transaction other than RI commits, only snapshot of RI exists
< vacuum drops the tuple, and cannot remove it from the new index
because this new index is not yet populated.
2.2.) sorting tuples
2.3.) index filled with tuples, incl. deleted tuple
3.) PHASE 3: wait for transactions
4.) PHASE 4: validate does not remove the tuple from the index,
because it is not built to do so: it will only insert new tuples.
Tuples that are marked for deletion are removed from the index only
through VACUUM (and optimistic ALL_DEAD detection).

According to my limited knowledge of RI, it requires VACUUM to not run
on the table during the initial index build process (which is
currently guaranteed through the use of a snapshot).


Regards,

Matthias van de Meent




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-15 Thread Álvaro Herrera
So one last remaining improvement was to have VACUUM ignore processes
doing CIC and RC to compute the Xid horizon of tuples to remove.  I
think we can do something simple like the attached patch.

-- 
Álvaro Herrera   Valdivia, Chile
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cf12eda504..f584230b79 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1601,7 +1601,7 @@ TransactionIdIsActive(TransactionId xid)
  * well as "internally" by GlobalVisUpdate() (see comment above struct
  * GlobalVisState).
  *
- * See the definition of ComputedXidHorizonsResult for the various computed
+ * See the definition of ComputeXidHorizonsResult for the various computed
  * horizons.
  *
  * For VACUUM separate horizons (used to decide which deleted tuples must
@@ -1610,7 +1610,12 @@ TransactionIdIsActive(TransactionId xid)
  * relations that's not required, since only backends in my own database could
  * ever see the tuples in them. Also, we can ignore concurrently running lazy
  * VACUUMs because (a) they must be working on other tables, and (b) they
- * don't need to do snapshot-based lookups.
+ * don't need to do snapshot-based lookups.  Also, for the non-catalog
+ * horizon, we can ignore CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY
+ * when they are working on non-partial, non-expressional indexes, for the
+ * same reasons and because they can't run in transaction blocks.  (They are
+ * not possible to ignore for catalogs, because CIC and RC do some catalog
+ * operations.)
  *
  * This also computes a horizon used to truncate pg_subtrans. For that
  * backends in all databases have to be considered, and concurrently running
@@ -1660,9 +1665,6 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 	bool		in_recovery = RecoveryInProgress();
 	TransactionId *other_xids = ProcGlobal->xids;
 
-	/* inferred after ProcArrayLock is released */
-	h->catalog_oldest_nonremovable = InvalidTransactionId;
-
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	h->latest_completed = ShmemVariableCache->latestCompletedXid;
@@ -1682,6 +1684,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 
 		h->oldest_considered_running = initial;
 		h->shared_oldest_nonremovable = initial;
+		h->catalog_oldest_nonremovable = initial;
 		h->data_oldest_nonremovable = initial;
 
 		/*
@@ -1752,7 +1755,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		if (statusFlags & (PROC_IN_VACUUM | PROC_IN_LOGICAL_DECODING))
 			continue;
 
-		/* shared tables need to take backends in all database into account */
+		/* shared tables need to take backends in all databases into account */
 		h->shared_oldest_nonremovable =
 			TransactionIdOlder(h->shared_oldest_nonremovable, xmin);
 
@@ -1768,16 +1771,29 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		 * to prune still needed data away). If the current backend never
 		 * connects to a database that is harmless, because
 		 * data_oldest_nonremovable will never be utilized.
+		 *
+		 * Additionally, processes doing CREATE INDEX CONCURRENTLY and REINDEX
+		 * CONCURRENTLY on "safe" indexes can be ignored for non-catalog
+		 * horizon. (But not for catalogs: some transactions in CIC/RC do
+		 * catalog updates.)
 		 */
 		if (in_recovery ||
 			MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId ||
 			proc->databaseId == 0)	/* always include WalSender */
 		{
-			h->data_oldest_nonremovable =
-TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+			if (statusFlags & PROC_IN_SAFE_IC)
+h->catalog_oldest_nonremovable =
+	TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
+			else
+h->data_oldest_nonremovable = h->catalog_oldest_nonremovable =
+	TransactionIdOlder(h->data_oldest_nonremovable, xmin);
 		}
 	}
 
+	/* catalog horizon should never be later than data */
+	Assert(TransactionIdPrecedesOrEquals(h->catalog_oldest_nonremovable,
+		 h->data_oldest_nonremovable));
+
 	/*
 	 * If in recovery fetch oldest xid in KnownAssignedXids, will be applied
 	 * after lock is released.
@@ -1799,6 +1815,8 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 			TransactionIdOlder(h->shared_oldest_nonremovable, kaxmin);
 		h->data_oldest_nonremovable =
 			TransactionIdOlder(h->data_oldest_nonremovable, kaxmin);
+		h->catalog_oldest_nonremovable =
+			TransactionIdOlder(h->catalog_oldest_nonremovable, kaxmin);
 		/* temp relations cannot be accessed in recovery */
 	}
 	else
@@ -1825,6 +1843,9 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 		h->data_oldest_nonremovable =
 			TransactionIdRetreatedBy(h->data_oldest_nonremovable,
 	 vacuum_defer_cleanup_age);
+		h->catalog_oldest_nonremovable =
+			TransactionIdRetreatedBy(h->catalog_oldest_nonremovable,
+	 vacuum_defer_cleanup_age);
 		/* defer doesn't apply to temp relations */
 	}
 
@@ -1847,7 +1868,6 @@ 

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-15 Thread Álvaro Herrera
On 2021-Jan-12, Álvaro Herrera wrote:

> On 2021-Jan-12, Álvaro Herrera wrote:
> 
> > > For the 0001 patch, since ReindexIndexInfo is used only within
> > > ReindexRelationConcurrently() I think it’s a function-local structure
> > > type. So we can declare it within the function. What do you think?
> > 
> > That's a good idea.  Pushed with that change, thanks.
> 
> Here's the other patch, with comments fixed per reviews, and rebased to
> account for the scope change.

Pushed.  At the last minute I decided to back off on reverting the flag
set that DefineIndex has, because there was no point in doing that.  I
also updated the comment in (two places of) ReindexRelationConcurrently
about why we don't do it there.  The previous submission had this:

> + /*
> +  * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, 
> because
> +  * it only acquires an Xid to do some catalog manipulations, after the
> +  * wait is over.
> +  */

but this fails to point out that the main reason not to do it, is to
avoid having to decide whether to do it or not when some indexes are
safe and others aren't.  So I changed to:

+   /*
+* While we could set PROC_IN_SAFE_IC if all indexes qualified, there's no
+* real need for that, because we only acquire an Xid after the wait is
+* done, and that lasts for a very short period.
+*/

Thanks!

-- 
Álvaro Herrera   Valdivia, Chile
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-12 Thread Álvaro Herrera
On 2021-Jan-12, Álvaro Herrera wrote:

> > For the 0001 patch, since ReindexIndexInfo is used only within
> > ReindexRelationConcurrently() I think it’s a function-local structure
> > type. So we can declare it within the function. What do you think?
> 
> That's a good idea.  Pushed with that change, thanks.

Here's the other patch, with comments fixed per reviews, and rebased to
account for the scope change.

-- 
Álvaro Herrera   Valdivia, Chile
Tulio: oh, para qué servirá este boton, Juan Carlos?
Policarpo: No, aléjense, no toquen la consola!
Juan Carlos: Lo apretaré una y otra vez.
>From f2476636a358b77aca4b7adb36ca8e383e78af9b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 30 Nov 2020 16:01:46 -0300
Subject: [PATCH v2] set PROC_IN_SAFE_IC during REINDEX CONCURRENTLY

---
 src/backend/commands/indexcmds.c | 56 +---
 src/include/storage/proc.h   |  1 +
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8c9c39a467..35c3d20eae 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -385,7 +385,7 @@ CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts)
  * lazy VACUUMs, because they won't be fazed by missing index entries
  * either.  (Manual ANALYZEs, however, can't be excluded because they
  * might be within transactions that are going to do arbitrary operations
- * later.)  Processes running CREATE INDEX CONCURRENTLY
+ * later.)  Processes running CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
  * on indexes that are neither expressional nor partial are also safe to
  * ignore, since we know that those processes won't examine any data
  * outside the table they're indexing.
@@ -1566,9 +1566,11 @@ DefineIndex(Oid relationId,
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
-	/* Tell concurrent index builds to ignore us, if index qualifies */
-	if (safe_index)
-		set_indexsafe_procflags();
+	/*
+	 * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
+	 * it only acquires an Xid to do some catalog manipulations, after the
+	 * wait is over.
+	 */
 
 	/* We should now definitely not be advertising any xmin. */
 	Assert(MyProc->xmin == InvalidTransactionId);
@@ -3066,6 +3068,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		Oid			indexId;
 		Oid			tableId;
 		Oid			amId;
+		bool		safe;		/* for set_indexsafe_procflags */
 	} ReindexIndexInfo;
 	List	   *heapRelationIds = NIL;
 	List	   *indexIds = NIL;
@@ -3377,6 +3380,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		heapRel = table_open(indexRel->rd_index->indrelid,
 			 ShareUpdateExclusiveLock);
 
+		/* determine safety of this index for set_indexsafe_procflags */
+		idx->safe = (indexRel->rd_indexprs == NIL &&
+	 indexRel->rd_indpred == NIL);
 		idx->tableId = RelationGetRelid(heapRel);
 		idx->amId = indexRel->rd_rel->relam;
 
@@ -3418,6 +3424,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 		newidx = palloc(sizeof(ReindexIndexInfo));
 		newidx->indexId = newIndexId;
+		newidx->safe = idx->safe;
 		newidx->tableId = idx->tableId;
 		newidx->amId = idx->amId;
 
@@ -3485,6 +3492,11 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	CommitTransactionCommand();
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
+	 */
+
 	/*
 	 * Phase 2 of REINDEX CONCURRENTLY
 	 *
@@ -3514,6 +3526,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
 		/* Set ActiveSnapshot since functions in the indexes may need it */
 		PushActiveSnapshot(GetTransactionSnapshot());
 
@@ -3534,8 +3550,14 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
+
 	StartTransactionCommand();
 
+	/*
+	 * Because we don't take a snapshot in this transaction, there's no need
+	 * to set the PROC_IN_SAFE_IC flag here.
+	 */
+
 	/*
 	 * Phase 3 of REINDEX CONCURRENTLY
 	 *
@@ -3564,6 +3586,10 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 */
 		CHECK_FOR_INTERRUPTS();
 
+		/* Tell concurrent indexing to ignore us, if index qualifies */
+		if (newidx->safe)
+			set_indexsafe_procflags();
+
 		/*
 		 * Take the "reference snapshot" that will be used by validate_index()
 		 * to filter candidate tuples.
@@ -3607,6 +3633,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		 * interesting tuples.  But since it might not contain tuples deleted
 		 * just before the reference snap was taken, we have to wait out any
 		 * transactions that might have older snapshots.
+		 *
+		 * Because we don't take a snapshot in this transaction, there's no
+		 * need to set the 

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-12 Thread Álvaro Herrera
On 2021-Jan-04, Masahiko Sawada wrote:

> On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar  wrote:
> >
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > The patch looks good to me. With regards to the code comments, I had pretty 
> > similar concerns as already raised by Dmitry; and those have already been 
> > answered by you. So this patch is good to go from my side once you have 
> > updated the comments per your last email.
> 
> For the 0001 patch, since ReindexIndexInfo is used only within
> ReindexRelationConcurrently() I think it’s a function-local structure
> type. So we can declare it within the function. What do you think?

That's a good idea.  Pushed with that change, thanks.

Thanks also to Dmitry and Hamid for the reviews.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2021-01-04 Thread Masahiko Sawada
On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> The patch looks good to me. With regards to the code comments, I had pretty 
> similar concerns as already raised by Dmitry; and those have already been 
> answered by you. So this patch is good to go from my side once you have 
> updated the comments per your last email.

For the 0001 patch, since ReindexIndexInfo is used only within
ReindexRelationConcurrently() I think it’s a function-local structure
type. So we can declare it within the function. What do you think?

Regards,


--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2020-12-08 Thread Hamid Akhtar
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

The patch looks good to me. With regards to the code comments, I had pretty 
similar concerns as already raised by Dmitry; and those have already been 
answered by you. So this patch is good to go from my side once you have updated 
the comments per your last email.

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2020-12-04 Thread Alvaro Herrera
On 2020-Dec-04, Dmitry Dolgov wrote:

> * This one is mostly for me to understand. There are couple of places
>   with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the
>   transaction only takes a snapshot to do some catalog manipulation'.
>   But for some of them I don't immediately see in the relevant code
>   anything related to snapshots. E.g. one in DefineIndex is followed by
>   WaitForOlderSnapshots (which seems to only do waiting, not taking a
>   snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid.
>   Is taking a snapshot hidden somewhere there inside?

Well, they're actually going to acquire an Xid, not a snapshot, so the
comment is slightly incorrect; I'll fix it, thanks for pointing that
out.  The point stands: because those transactions are of very short
duration (at least of very short duration after the point where the XID
is obtained), it's not necessary to set the PROC_IN_SAFE_IC flag, since
it won't cause any disruption to other processes.

It is possible that I copied the wrong comment in DefineIndex.  (I only
noticed that one after I had mulled over the ones in
ReindexRelationConcurrently, each of which is skipping setting the flag
for slightly different reasons.)




Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2020-12-04 Thread Dmitry Dolgov
> On Mon, Nov 30, 2020 at 04:54:39PM -0300, Alvaro Herrera wrote:
>
> In a previous thread [1], we added smarts so that processes running
> CREATE INDEX CONCURRENTLY would not wait for each other.
>
> One is adding the same to REINDEX CONCURRENTLY.  I've attached patch
> 0002 here which does that.
>
> Why 0002, you ask?  That's because preparatory patch 0001 simplifies the
> ReindexRelationConcurrently somewhat by adding a struct to be used of
> indexes that are going to be processed, instead of just a list of Oids.
> This is a good change in itself because it let us get rid of duplicative
> open/close of the index rels in order to obtain some info that's already
> known at the start.

Thanks! The patch looks pretty good to me, after reading it I only have
a few minor comments/questions:

* ReindexIndexInfo sounds a bit weird for me because of the repeating
  part, although I see there is already a similar ReindexIndexCallbackState
  so probably it's fine.

* This one is mostly for me to understand. There are couple of places
  with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the
  transaction only takes a snapshot to do some catalog manipulation'.
  But for some of them I don't immediately see in the relevant code
  anything related to snapshots. E.g. one in DefineIndex is followed by
  WaitForOlderSnapshots (which seems to only do waiting, not taking a
  snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid.
  Is taking a snapshot hidden somewhere there inside?